Springer Fin, 


aT Eo 


Steven E. Shreve 


- t 
a7 i Ec f an fcr Tr 
LULL q aicLull ‘yd 
all ale A 


-= 
i 
2 
a 
=] 
' 
== 
o 
F 
al 
E 
i on 


Steven Shreve: Stochastic Calculus and Finance 


PRASAD CHALASANI SOMESH JHA 
Carnegie Mellon University Carnegie Mellon University 
chal @cs.cmu.edu sjha@cs.cmu.edu 


(©Copyright; Steven E. Shreve, 1996 


July 25, 1997 


Contents 


1 Introduction to Probability Theory 11 
1.1. The Binomial Asset Pricing Model. ...................02.000.4 11 
1.2 Finite Probability Spaces... 2... ee ee. 16 
1.3 Lebesgue Measure and the Lebesgue Integral ..................0.. 22 
1.4 General ProbabilitySpaces .. 2... 2... 2... ee ee 30 
15° Undependence® ne 201-4. Stee he Sede do bd Soe, Qe a ee le 40 

1.5.1 Independence ofsets ... 2... 2... ...0 0.0.0... 000000020. 40 
1.5.2 Independence of g-algebras ....................0000.0. 41 
1.5.3. Independence of random variables ..................20.. 42 
1.5.4 Correlationandindependence ...................020-. 44 
1.5.5 Independence and conditional expectation. ................. 45 
1.5.6 LawofLargeNumbers................ 0.000000 000. 46 
1.5.7. Central LimitTheorem...................2..0 0000. 47 

2 Conditional Expectation 49 
2.1 A Binomial Model for Stock Price Dynamics ..................-.. 49 
2.2. Information: bi. 0 else, Get ie Nea ih Ae eee ee 1S ete dad 98 ca Ma 50 
2.3 Conditional Expectation ..... 2.2... .. 2.0... 2. 000000002 ee eee 52 

2.3.1. -Anexample 0+. 65 bbe ke ee 52 
2.3.2 Definition of Conditional Expectation ..................0.. 53 
2.3.3. Further discussion of Partial Averaging .................-.. 54 
2.3.4 Properties of Conditional Expectation .................... 55 
2.3.5. Examples from the Binomial Model ...................0.. 57 
2:4 Martingales>.2 ts245 8 swe gh he Abana ied Acad Sanya be Gh ae ee ae Ete yd 58 


Arbitrage Pricing 

3h, ° -BinomialiPricing: 2302 fo he Gk dd ea ey ee a ee bd 

3:2. -Geéneral-one-step: APT: oe bade bw oh Ge De hehe Gee ek Se 

3.3. Risk-Neutral Probability Measure .................2....0000. 
353.1 “Portfolio: Process’. jk eee a oe se a eo ee ee eae 
3.3.2 Self-financing Value of a Portfolio Process A... 2.2... 2 2 eee 

3.4 Simple European Derivative Securities... 2... ............0000. 


3.5 The Binomial ModelisComplete......................000.0. 


The Markov Property 

4.1 Binomial Model Pricing and Hedging .....................004. 

4.2 ComputationalIssues .. 2... ee ee 

4.3". Markov Processes ‘0.0 ao wet ec a PB ae iow Soe aa ay td eae 
4.3.1 Different ways to write the Markov property ................ 

4.4 Showing thata processis Markov ............... 00000000004 

4.5 Applicationto Exotic Options ........ 2... . 0.0.20 00002 2 ee 


Stopping Times and American Options 
S.A American Pricin® 2 66-5 aed 4 doe A ek ce gh eh Boe be ee, oe 
5.2 Value of Portfolio Hedging an American Option... ..............-.. 


5.3 Informationup toaStoppingTime.....................2000. 


Properties of American Derivative Securities 

6.1 “The properties: 6.5. cas Ge BH Pe ee be Gk ahh oe eee he 
62. ‘Proofs’ of the Properties: i...4:..4¢04c0 800s Boh ee ae A a ae ed 
6.3. Compound European Derivative Securities... ..............000-. 


6.4 Optimal Exercise of American Derivative Security... .............0.. 


Jensen’s Inequality 
7.1 Jensen’s Inequality for Conditional Expectations. .................. 
7.2 Optimal Exercise of an AmericanCall .....................00.. 


%.3:. Stopped. Martingales 05. sg. ete Deka Be Boe el aa pee Se 


Random Walks 
Sil. First Passage Time: 4.43200 Be Be Qe a A ew Be ee Re a 


10 


11 


12 


8.2. Tis almost surely fifite 2.5 2.0. sb bb poe poe ee RR a ee 
8.3 The moment generating functionfort ............... 2002020004 
8:4 Expectatiomor 7’. 145.5 a oS aoe tee A A We Se we ee a Ge wa be 
8.5 The Strong Markov Property ........... 0.000000. pee ee eee 
8.6 General First Passage Times ............. 00000002 eee eee 
8.7 Example: Perpetual American Put ..................0 0020000. 
8.82 -Difference:Equation =. 4. <3) o%04. wae Boke da ard aot ee eA ae a ee arte 
8.9 Distribution of First Passage Times... . 2... ...0.... 2.000000 0004 


8.10 The Reflection Principle .......... 0... 0.00... 0 000000000. 


Pricing in terms of Market Probabilities: The Radon-Nikodym Theorem. 

9.1 Radon-NikodymTheorem ................ 0.000000 00000. 
9.2 Radon-Nikodym Martingales.... 2.2... 2.2.2.2... 2.000.000.0000 000. 
9.3. The State Price Density Process .............. 0.000000 0000. 
9.4 Stochastic Volatility Binomial Model... ..................00.0. 
9.5 Another Applicaton of the Radon-Nikodym Theorem ................ 


Capital Asset Pricing 
10.1 An Optimization Problem... .......... 0.02.00 0000000000. 


General Random Variables 

11.1 Law ofaRandom Variable .............. 02.0.0... 00000000. 
11.2 Density ofa Random Variable ............... 0.000000 0000. 
11.3: Expectation: <6 20.0052 A ee 
11.4 Tworandom variables... 2... 2... 0 ee ee 
11.5, Mareinal Density: 3.2. a4 As ae ake okt Dh he eR ae eee a GO te ed 
11.6 Conditional Expectation .. 2... 2... 0.0... 000000002 eee eee 
17. Conditional Density... ede Ay Sed dQ Gade Sak, a Sh aoa a eee le en 
11.8 Multivariate Normal Distribution. ............0.. 0.02.02 000088 


11.9 Bivariate normal distribution .............. 00000000000 0004 


Semi-Continuous Models 


12.1 Discrete-time Brownian Motion ............0.0.....000 0000084 


12:2, The Stock :Price: Process)’ 034.42 aosp dept k acded oot ee pA wah ee Grd 132 
12.3 Remainder of the Market ...........0.0. 00. 02002 eee ee eee 133 
12.4 Risk-Neutral Measure... 2... 2... 133 
12.5 Risk-Neutral Pricing ... 2... ...0 0.0... 0.000000 2p eee 134 
12:6 -ATbItTage: 230% 8 epee aad ae Pee a aha Se ae oe ee ae ha ed 134 
12.7 Stalking the Risk-Neutral Measure ................2.2....0000. 135 
12.8 Pricing:a European: Call . ne ee ch ee ee ee ee de 138 
13 Brownian Motion 139 
13.1 Symmetric Random Walk... 2... ee 139 
13.2 The Law of Large Numbers... 2... 2... .... 20.0.0... 00. eee 139 
13:3 Central Limit Theorem) «040.02 004 8-34 barb ee we pe awe Ee ke eS 140 
13.4 Brownian Motion as a Limit of Random Walks ................004. 141 
13.5: Brownian. Motion ¢.5 sss086 eek ba oe ee ee ake we ee Bed 142 
13.6 Covariance of Brownian Motion ............. 0.000000 eee 143 
13.7 Finite-Dimensional Distributions of Brownian Motion. ............... 144 
13.8 Filtration generated by a Brownian Motion. ..................2--. 144 
13:9: Martingale Property: s.5.202. ee ad ed ee ae ee AE es ae de ed 145 
13.10The Limit ofa Binomial Model... ...........0....2..02..02000. 145 
13.11Starting at Points Other ThanO................ 0.020.000 000. 147 
13.12Markov Property for Brownian Motion. ..............2...-.2000. 147 
13.13 Transition Density 2 2 2 o.2 ie eine eed Sn oe ah ow a ie Bhan d 149 
13.14First Passage Time... ee A ba ee ee Be ee ee 149 
14 The It6 Integral 153 
14.1 Brownian Motion... ....... 0.0... 000 ee ee ee 153 
1422) First Variation’. ¢: .u05 bahar fel bee ob Ghd dala Ge ee 153 
14.3 Quadratic Variation .. 2... 155 
14.4 Quadratic Variation as Absolute Volatility ..................0.0.0. 157 
14.5 Construction of the It6 Integral... 2... ee ee. 158 
14.6 It6 integral of an elementary integrand ...................000-. 158 
14.7 Properties of the It6 integral of an elementary process ................ 159 


14.8 It6 integral of a general integrand... .................2..2.-000.4 162 


14.9 Properties of the (general) It6 integral ... 2.2... 2.2.2.2... ......000.0. 


14.10Quadratic variation of an It6 integral .. 2... ee ee ee 


15 It6’s Formula 
15.1 It6’s formula for one Brownian motion... ..........2. 002002 eee 
15.2 Derivation of It6’s formula ...... 2... 20.00.0200 2 ee ee 
15.3 Geometric Brownian motion .... 2... 0.0... 00 ee ee 
15.4 Quadratic variation of geometric Brownian motion ...............-.. 
15.5 Volatility of Geometric Brownian motion .................220-. 
15.6 First derivation of the Black-Scholes formula ..................0.. 
15.7 Mean and variance of the Cox-Ingersoll-Ross process ..............-.. 
15.8 Multidimensional Brownian Motion ..............2. 002.008 eee 
15.9 Cross-variations of Brownian motions .............2. 000002 eee 


15.10Multi-dimensionalIt6 formula ...........0.0.0.0.0.....0000008. 


16 Markov processes and the Kolmogorov equations 
16.1 Stochastic Differential Equations. .............. 2.2... .. 00000. 
16:2:.'Markov Property> 2.0.2.0 6 4 Seats Bek eB Boies Re ae i, vee el 
16.3 Transitiondensity.... 2.2... 2... ee 
16.4 The Kolmogorov Backward Equation ...............2....0000. 
16.5 Connection between stochastic calculusand KBE .................. 
16.6 Black-Scholes... 2... ee 
16.7 Black-Scholes with price-dependent volatility .................... 


17 Girsanov’s theorem and the risk-neutral measure 
17.1 Conditional expectations under Te ys eee Ghia s eee ieee a et ee 


17.2 Risk-neutral measure ........0.0.0 0000... eee 


18 Martingale Representation Theorem 
18.1 Martingale Representation Theorem ................2....0000. 
18.2 Ahedging application... ... 2... 0... 2. 0000000002 ee eee 
18.3 d-dimensional Girsanov Theorem ........... 2.000. ee eee ee eee 
18.4 d-dimensional Martingale Representation Theorem ................. 


18.5 Multi-dimensional market model .............0.0.0.0.0.0....2. 0004 


19 A two-dimensional market model 
19.1 Hedging when-l<p<1l...........0.. 
19.2 Hedgingwhenp=1..............0.. 


20 Pricing Exotic Options 
20.1 Reflection principle for Brownian motion ...... 
20.2 Up and out Europeancall, ...........02.. 
20.3 Apracticalissue...............-.-04. 


21 Asian Options 
21.1 Feynman-Kac Theorem ................ 
21.2 Constructing the hedge ................ 
21.3 Partial average payoff Asianoption.......... 


22 Summary of Arbitrage Pricing Theory 
22.1 Binomial model, Hedging Portfolio ......... 
22.2 Setting up the continuousmodel ........... 
22.3 Risk-neutral pricing and hedging ........... 


22.4 Implementation of risk-neutral pricing and hedging . 


23 Recognizing a Brownian Motion 
23.1 Identifying volatility and correlation ......... 


23.2 Reversing the process. ..............-.. 


24 An outside barrier option 
24.1 Computing the optionvalue.............. 
24.2 The PDE for the outside barrier option. ....... 
24.3, The hedge. ys pach EY ee bea 


25 American Options 
25.1 Preview of perpetual American put. ......... 
25.2 First passage times for Brownian motion: first method 
25.3 Driftadjustment.................0.. 
25.4 Drift-adjusted Laplace transform ........... 


25.5 First passage times: Second method ......... 


25.6 Perpetual American put. . 2... ce ee ee ee 
25.7 Value of the perpetual American put ..................202.-004 
25.8: Hedeing the put. 0.04 beeen aS deel Sb Sade Oa es ble bata we ee 
25.9 Perpetual American contingentclaim..................2020004 
25.10Perpetual Americancall. 2... 2 ee 
25.11Put withexpiration ©... 2... ee ee 


25.12American contingent claim withexpiration ..................-.4. 


26 Options on dividend-paying stocks 
26.1 American option with convex payoff function .................-.. 
26.2 Dividend paying stock ... 2... 2... ee 
26:3. Hedging abtime tj. 4 exe we oo SS hoe eS ee ele eel ad 


27 Bonds, forward contracts and futures 
2:11 “Forward! COMUAaCts: 1.4. 2 lnk eae Bak 4s are Bot bee Aw ah kee area ck 
27.2 Hedging aforward contract... 2... 2... 2 ee 


27.3 Future: contracts =~ 5.05. bse oo Ok ib ok Bs da ee PAA Ges be & aed ade 


27.5 Forward-future spread... 2. 


27.6 Backwardation and contango ........... 2.0.00. eee eee es 


28 Term-structure models 
28.1 Computing arbitrage-free bond prices: first method ................. 
28.2 Some interest-rate dependent assets ...............0.0 0000004 
28.3* Terminology’ v2: 2. e.g A a ee BS ee eRe aa Oe A ave od 
28.4 Forwardrateagreement...... 2.2... 20.0000. 002 eee eee es 
28.5 Recovering the interest r(t) from the forwardrate..............000. 
28.6 Computing arbitrage-free bond prices: Heath-Jarrow-Morton method. ....... 
28.7 Checking for absence of arbitrage ...............2...- 000.0004 


28.8 Implementation of the Heath-Jarrow-Morton model ................. 


29 Gaussian processes 


29.1 Anexample: Brownian Motion. ............ 0.2.0... 00002 eee 


30 Hull and White model 


31 


32 


33 


34 


30.1 Fiddling withthe formulas ............. 0... 0.0.20... 0 000. 
30.2 Dynamics of the bond price... 2... 2... ee 
30.3 Calibration of the Hull & Whitemodel...................020.0. 
30:4 -Option:on abond? . 83d. ede oe Riad Aled oSties ee ee PL Dees 


Cox-Ingersoll-Ross model 

SL Equilibriimn disttibucion. ory (6) itn eye Pe lia thw tte aa age aoe eat el ees 
31.2 Kolmogorov forward equation ........... 0.0.0.0. 00000 eee 
31.3 Cox-Ingersoll-Ross equilibrium density ..................000.0. 
31.4 Bond pricesinthe CIR model ............0..... 0.0.2... 0000. 
31.5 Optiononabond .......... 2.0... . 20202 ee ee 
31.6 Deterministic time change of CIR model... ................200.0. 


STF CaliDrattOnl <6 ee hae ce at hy Gocco tel satan ope Gates tate Be aaa toeeag Ee Sata Des el cad nt soe Gop gested Be nece Naas od 


A two-factor model (Duffie & Kan) 
32.1 Non-negativityofY ... 2... 0... 00... .0 000022 ee ee. 
32.2 Zero-coupon bond prices .. 2... ee 


32:3. Calibration) - 2. kee BG BAS 2 RR OR oe OMe 


Change of numéraire 
33.1 Bond price as numéraire .. 2... . ee 
33.2 Stock price asnuméraire .. 2... ee 


33.3. Merton option pricing formula ..... 2... 2.2.0.2... 000000000004 


Brace-Gatarek-Musiela model 

34.1 Review of HJM under risk-neutral PP... 2... ee 
34.2 Brace-Gatarek-Musielamodel ............0 0.002002 eee ee eee 
34.3, LIBOR: os 0d: kon Book ee es Bon EA he Se ee ob a ie ee Gs 
34.4 Forward LIBOR.. 2... 0... ee 
S45. PHS y Mame ORC) ice a8, ce ina Ae gece yby Ge agi Ses pk oe oie pe Bk ghee ae 
34.6 Implementationof BGM ............. 2.000000 eee eee 
34-7; BOnd prices vein Poa Ba hw ke eS i br 2G a SS Gi 


34.8 Forward LIBOR under more forward measure .................0.0.4. 


34.9 Pricing an interestrate caplet... 2... ee 343 
34.10Pricing an interestratecap . 2.2... ee 345 
34:11 Calibration 0f BGM: 32.0. eh nd ied osha ed eae oe te die ech a e yeas 345 
S41 2Lone rates. ices Oo ee, So te AOS naan Gags ee kak Gta hed 346 


34:13 Pricin§aiswapie sed « oecke aad 52-3 je oy arand thee a ee. © aE bee gee as 346 


10 


Chapter 1 


Introduction to Probability Theory 


1.1. The Binomial Asset Pricing Model 


The binomial asset pricing model provides a powerful tool to understand arbitrage pricing theory 
and probability theory. In this course, we shall use it for both these purposes. 


In the binomial asset pricing model, we model stock prices in discrete time, assuming that at each 
step, the stock price will change to one of two possible values. Let us begin with an initial positive 
stock price So. There are two positive numbers, d and u, with 


0<d<u, (1.1) 


such that at the next period, the stock price will be either dS or uSo. Typically, we take d and wu 
to satisfy0 < d < 1 < u, so change of the stock price from So to dSo represents a downward 
movement, and change of the stock price from Sg to wSg represents an upward movement. It is 
common to also have d = 1. and this will be the case in many of our examples. However, strictly 
speaking, for what we are about to do we need to assume only (1.1) and (1.2) below. 


Of course, stock price movements are much more complicated than indicated by the binomial asset 
pricing model. We consider this simple model for three reasons. First of all, within this model the 
concept of arbitrage pricing and its relation to risk-neutral pricing is clearly illuminated. Secondly, 
the model is used in practice because with a sufficient number of steps, it provides a good, compu- 
tationally tractable approximation to continuous-time models. Thirdly, within the binomial model 
we can develop the theory of conditional expectations and martingales which lies at the heart of 
continuous-time models. 


With this third motivation in mind, we develop notation for the binomial model which is a bit 
different from that normally found in practice. Let us imagine that we are tossing a coin, and when 
we get a “Head,” the stock price moves up, but when we get a “Tail,” the price moves down. We 
denote the price at time 1 by 5)(H) = wo if the toss results in head (H), and by $;(Z') = dSpo if it 


11 


12 


a nen 
Ss; (H) =8 
Ske. S,(HT) =4 
S = 
, wee So(TH) =4 
i a “ix. 
STDS 


Figure 1.1: Binomial tree of stock prices with So = 4, u = 1/d = 2. 


results in tail (T). After the second toss, the price will be one of: 


So(HH) = uS1(H) = u’ So, S9(HT) = dS\(H) = duSo, 


So (TH) = US (2) = udSo, So (TT) = dSy (T) = d?So. 


After three tosses, there are eight possible coin sequences, although not all of them result in different 
stock prices at time 3. 


For the moment, let us assume that the third toss is the last one and denote by 
Q={HHH,HHT,HTH,HTT,THH,THT,TTH,TTT} 


the set of all possible outcomes of the three tosses. The set 2 of all possible outcomes of a ran- 
dom experiment is called the sample space for the experiment, and the elements w of 2 are called 
sample points. In this case, each sample point w is a sequence of length three. We denote the é&-th 
component of w by w;. For example, when w = HTH, wehavew, = H,we = T andw3 = H. 


The stock price S;, at time k depends on the coin tosses. To emphasize this, we often write S;,(w). 
Actually, this notation does not quite tell the whole story, for while S3 depends on all of w, So 
depends on only the first two components of w, 5; depends on only the first component of w, and 
So does not depend on w at all. Sometimes we will use notation such $2(w1, wz) just to record more 
explicitly how Sj depends on w = (w 1, w2,w3). 

Example 1.1 Set So = 4, u = 2 andd = $. We have then the binomial “tree” of possible stock 
prices shown in Fig. 1.1. Each sample point w = (w1,w2,w3) represents a path through the tree. 
Thus, we can think of the sample space (2 as either the set of all possible outcomes from three coin 
tosses or as the set of all possible paths through the tree. 


To complete our binomial asset pricing model, we introduce a money market with interest rate r; 
$1 invested in the money market becomes $(1 + 7) in the next period. We take r to be the interest 


CHAPTER 1. Introduction to Probability Theory 13 


rate for both borrowing and lending. (This is not as ridiculous as it first seems, because in a many 
applications of the model, an agent is either borrowing or lending (not both) and knows in advance 
which she will be doing; in such an application, she should take r to be the rate of interest for her 
activity.) We assume that 


d<l+r<u. (1.2) 


The model would not make sense if we did not have this condition. For example, if 1+ r > u, then 
the rate of return on the money market is always at least as great as and sometimes greater than the 
return on the stock, and no one would invest in the stock. The inequality d > 1 + r cannot happen 
unless either r is negative (which never happens, except maybe once upon a time in Switzerland) or 
d > 1. In the latter case, the stock does not really go “down” if we get a tail; it just goes up less 
than if we had gotten a head. One should borrow money at interest rate r and invest in the stock, 
since even in the worst case, the stock price rises at least as fast as the debt used to buy it. 


With the stock as the underlying asset, let us consider a European call option with strike price 
& > 0 and expiration time 1. This option confers the right to buy the stock at time 1 for K dollars, 
and so is worth 5, — Kk at time 1 if S; — K is positive and is otherwise worth zero. We denote by 


Vi(w) = (S1(w) — K)* 2 max{S(w) — K,0} 


the value (payoff) of this option at expiration. Of course, V;(w) actually depends only on w, and 
we can and do sometimes write Vj (w;) rather than Vj (w). Our first task is to compute the arbitrage 
price of this option at time zero. 


Suppose at time zero you sell the call for Vp dollars, where Vo is still to be determined. You now 
have an obligation to pay off (uSo — K)t if w, = H and to pay off (dS) — K)* ifw, = T. At 
the time you sell the option, you don’t yet know which value w, will take. You hedge your short 
position in the option by buying Ag shares of stock, where Ag is still to be determined. You can use 
the proceeds Vo of the sale of the option for this purpose, and then borrow if necessary at interest 
rate r to complete the purchase. If Vo is more than necessary to buy the Ag shares of stock, you 
invest the residual money at interest rate r. In either case, you will have Vo — Ao.So dollars invested 
in the money market, where this quantity might be negative. You will also own Ag shares of stock. 


If the stock goes up, the value of your portfolio (excluding the short position in the option) is 
AoSi(H) + (1+ 1r)(Vo — AoSo), 
and you need to have V;(#). Thus, you want to choose Vo and Ag so that 
Vi(H) = AodSi (7) + (1+ 7) (Vo — AoSo)- (1.3) 
If the stock goes down, the value of your portfolio is 
AoSi(L) + (1+ 7) (Vo — AoSo), 
and you need to have V; (7). Thus, you want to choose Vo and Ag to also have 


Vi(T) = AoS1 (T) + (1 + r) (Vo = Ao So). (1.4) 


14 


These are two equations in two unknowns, and we solve them below 


Subtracting (1.4) from (1.3), we obtain 
V (A) — Vi(P) = Ao(Si() — Si(T)), (1.5) 


so that 


_ Vi(A) — Vi(T) 


AOS Si) — S(T)’ 


(1.6) 
This is a discrete-time version of the famous “delta-hedging” formula for derivative securities, ac- 
cording to which the number of shares of an underlying asset a hedge should hold is the derivative 
(in the sense of calculus) of the value of the derivative security with respect to the price of the 
underlying asset. This formula is so pervasive the when a practitioner says “delta”, she means the 
derivative (in the sense of calculus) just described. Note, however, that my definition of Ag is the 
number of shares of stock one holds at time zero, and (1.6) is a consequence of this definition, not 
the definition of Ag itself. Depending on how uncertainty enters the model, there can be cases 
in which the number of shares of stock a hedge should hold is not the (calculus) derivative of the 
derivative security with respect to the price of the underlying asset. 


To complete the solution of (1.3) and (1.4), we substitute (1.6) into either (1.3) or (1.4) and solve 
for Vo. After some simplification, this leads to the formula 


ee ee ee u-(1+r) 
= | tt Hy. (1.7) 


Vo 
This is the arbitrage price for the European call option with payoff V; at time 1. To simplify this 
formula, we define 


=1-p, (1.8) 
so that (1.7) becomes 
1 
= — |pV\(H qVi(T)]. 1. 
Vor srg OME ee gu) (1.9) 


Because we have taken d < wu, both p and q are defined,ji.e., the denominator in (1.8) is not zero. 
Because of (1.2), both p and ¢ are in the interval (0, 1), and because they sum to 1, we can regard 
them as probabilities of H and 7’, respectively. They are the risk-neutral probabilites. They ap- 
peared when we solved the two equations (1.3) and (1.4), and have nothing to do with the actual 
probabilities of getting H or T' on the coin tosses. In fact, at this point, they are nothing more than 
a convenient tool for writing (1.7) as (1.9). 


We now consider a European call which pays off K dollars at time 2. At expiration, the payoff of 


this option is V2 S (So - K)*, where V2 and 5S» depend on w, and wy, the first and second coin 
tosses. We want to determine the arbitrage price for this option at time zero. Suppose an agent sells 
the option at time zero for Vo dollars, where Vo is still to be determined. She then buys Ap shares 


CHAPTER 1. Introduction to Probability Theory 15 


of stock, investing Vg — Ao.So dollars in the money market to finance this. At time 1, the agent has 
a portfolio (excluding the short position in the option) valued at 


X\ 2 AoS1 + (1+ r)(Vo — AoSo). (1.10) 


Although we do not indicate it in the notation, 5, and therefore X depend on w 1, the outcome of 
the first coin toss. Thus, there are really two equations implicit in (1.10): 


|b 


X,(H) 
X1(T) 


AoS1 (H) + (i + 7) (Vo = Ao So), 
AoSi(F) + (1+ 17)(Vo — Ao So). 


|b 


After the first coin toss, the agent has X, dollars and can readjust her hedge. Suppose she decides to 
now hold A, shares of stock, where A, is allowed to depend on w, because the agent knows what 
value w; has taken. She invests the remainder of her wealth, X; — A,.S; in the money market. In 
the next period, her wealth will be given by the right-hand side of the following equation, and she 
wants it to be V2. Therefore, she wants to have 


V2 = AiSo4+ (1+7r)(X1 -— AyS4). (1.11) 


Although we do not indicate it in the notation, Sz and V2 depend on w, and wz, the outcomes of the 
first two coin tosses. Considering all four possible outcomes, we can write (1.11) as four equations: 


Vi(HH) = Ai(H)S2(HA)+ 14+1r)(X1(4) — Ai(A)51(4)), 
ViA(HT) = Ai(H)S2(HT)+ 1 4+1r)(X1(A) — Ai(4)51(4)), 
VA(TH) = Ai(T)S(TH)+04+r)(Xi(T) - Av(P)Si(T)), 
VATT) = Ai(T)S(7TT) + 14 7r) (X(T) -— Av(P)Si(7)). 


We now have six equations, the two represented by (1.10) and the four represented by (1.11), in the 
six unknowns Vo, Ao, Ai (fH), Ai (TL), X41 (#1), and X;(T). 


To solve these equations, and thereby determine the arbitrage price Vp at time zero of the option and 
the hedging portfolio Ap, Ai () and A; (7), we begin with the last two 


Vo(TH) = Ai(T)S(TH)+ 1+r)(Xi(P) -— Ai(T)Si(T)), 
VAT) = Ai(T)S.(7T) + A4+r)(X1(7) — A(P)S1(7)). 


Subtracting one of these from the other and solving for A,(7), we obtain the “delta-hedging for- 
mula” 


(1.12) 
and substituting this into either equation, we can solve for 


X(T) = lover) + @V,(TT)]. (1.13) 


16 


Equation (1.13), gives the value the hedging portfolio should have at time 1 if the stock goes down 
between times 0 and 1. We define this quantity to be the arbitrage value of the option at time 1 if 
w, = T, and we denote it by V|(7’). We have just shown that 


A 


V4(T) li (TH) + @Vi(TT)). (1.14) 


The hedger should choose her portfolio so that her wealth X,(7’) if w, = T agrees with V;(7’) 
defined by (1.14). This formula is analgous to formula (1.9), but postponed by one step. The first 
two equations implicit in (1.11) lead in a similar way to the formulas 


Ane (1.15) 

and X,(H) = V, (4), where V;(/7) is the value of the option at time | if w; = H, defined by 
Vi(H) love) + GV2(HT)). (1.16) 
This is again analgous to formula (1.9), postponed by one step. Finally, we plug the values X)(H) = 
V,(H) and X;(T) = Vi(Z) into the two equations implicit in (1.10). The solution of these equa- 


tions for Ag and Vo is the same as the solution of (1.3) and (1.4), and results again in (1.6) and 
(1.9). 


The pattern emerging here persists, regardless of the number of periods. If V;, denotes the value at 


time & of a derivative security, and this depends on the first & coin tosses w,,...,w ,, then at time 
k; — 1, after the first k — 1 tosses w,,...,w,_1 are known, the portfolio to hedge a short position 
should hold A;,-1 (wi, ...,«%—1) shares of stock, where 
V ...,Wp_-1, 1) — V; vey Wea, DP 
Apap aie Ne (W1,.+-,Wk-1, 1) — Ve(wi,.--,We-1,T) (1.17) 


Si (wr, ee 5 Whe idl) ae Si (wi, ae eo ee a 


and the value at time & — 1 of the derivative security, when the first k — 1 coin tosses result in the 
outcomes w1,...,Wz—1, 1S given by 


1 rE FS 
Ved @iparag QR) = Tp Valor, 6+, Wk-1, 1) + GVe (wi, ..-,We-1,T)] 


(1.18) 
1.2 Finite Probability Spaces 
Let 2 be a set with finitely many elements. An example to keep in mind is 
Q={HHH,HHT,HTH,HTT,THH,THT,TTH,TTT} (2.1) 


of all possible outcomes of three coin tosses. Let F be the set of all subsets of 2. Some sets in F 
are 0, { HHH, HHT, HTH, HTT}, {TTT}, and Q itself. How many sets are there in F? 


CHAPTER 1. Introduction to Probability Theory 17 


Definition 1.1 A probability measure IP is a function mapping F into [0,1] with the following 
properties: 


(@) P(Q) =1, 


(ii) If Ay, Ao,... is a sequence of disjoint sets in F, then 
P (U 4s) = 5° P(A;). 


Probability measures have the following interpretation. Let A be a subset of *. Imagine that 2 is 
the set of all possible outcomes of some random experiment. There is a certain probability, between 
0 and 1, that when that experiment is performed, the outcome will lie in the set A. We think of 
IP(A) as this probability. 


Example 1.2 Suppose a coin has probability 4 for H and 2 for T’. For the individual elements of 
Q in (2.1), define 


For A € ¥, we define 


P(A) = 5° P{w}. (2.2) 


For example, 


IP{ HHH, HHT, HTH, HTT} = ek +2 G). (5) “ (=) G) = 7 


which is another way of saying that the probability of H on the first toss is z. 


As in the above example, it is generally the case that we specify a probability measure on only some 
of the subsets of 2 and then use property (ii) of Definition 1.1 to determine JP(A) for the remaining 
sets A € ¥. In the above example, we specified the probability measure only for the sets containing 
a single element, and then used Definition 1.1(ii) in the form (2.2) (see Problem 1.4(ii)) to determine 
JP for all the other sets in F. 


Definition 1.2 Let Q be a nonempty set. A o-algebra is a collection G of subsets of 2 with the 
following three properties: 


() DEG, 


18 
(ii) If A € G, then its complement A‘ € G, 
(iii) If Ay, Ay, A3,... is a sequence of sets in G, then UP? , A; is also in G. 


Here are some important o-algebras of subsets of the set 2 in Example 1.2: 


Fo = {0.2}, 


Fi {0 Q,{HHH, HHT, HTH, HTT}, {THH,THT,TTH, rrny}, 


Fy 


{0 Q, {HHH, HHT}, {HTH, HTT}, {THH, THT}, {TTH, TTT}, 


and all sets which can be built by taking unions of sel, 
Fz; = ¥F = Theset of all subsets of 2. 


To simplify notation a bit, let us define 


Ay {HHH,HHT,HTH, HTT} = {H on the first toss}, 
Ar 4 {THH,THT,TTH,TTT} = {T on the first toss}, 
so that 
Fi = {0,Q, An, Ap}, 


and let us define 


Any & {HHH, HHT} = {HH on the first two tosses}, 
Apr {HTH, HTT} = {HT on the first two tosses}, 
Ary & {THH,THT}= {TH on the first two tosses}, 
Arr & {TTH,TTT} = {TT on the first two tosses}, 


so that 


Fy = {0,0, Ann, Ant, Aru, Arr, 
Ay, At, AnH U Ara, Anu U Art, Ant U Ara, Ant U Art, 


es c c c 
Arn; Ayr, ArH, Afr}- 


We interpret a-algebras as a record of information. Suppose the coin is tossed three times, and you 
are not told the outcome, but you are told, for every set in 7, whether or not the outcome is in that 
set. For example, you would be told that the outcome is not in 9 and is in 2. Moreover, you might 
be told that the outcome is not in Ay but is in Av. In effect, you have been told that the first toss 
was a 7’, and nothing more. The c-algebra F, is said to contain the “information of the first toss”, 
which is usually called the “information up to time 1”. Similarly, 72 contains the “information of 


CHAPTER 1. Introduction to Probability Theory 19 


the first two tosses,” which is the “information up to time 2.” The o-algebra 73 = F contains “full 
information” about the outcome of all three tosses. The so-called “trivial” c-algebra Fo contains no 
information. Knowing whether the outcome w of the three tosses is in () (it is not) and whether it is 
in Q (it is) tells you nothing about w 


Definition 1.3 Let 2 be a nonempty finite set. A filtrationis a sequence of o-algebras Fo, Fi, F2,... 


such that each o-algebra in the sequence contains all the sets contained by the previous o-algebra. 


Definition 1.4 Let Q be a nonempty finite set and let F be the c-algebra of all subsets of 2. A 
random variable is a function mapping 2 into JR. 


Example 1.3 Let 2 be given by (2.1) and consider the binomial asset pricing Example 1.1, where 
So = 4,u = 2andd = 4. Then So, $1, S2 and $3 are all random variables. For example, 
S2(HHT) = u?So = 16. The “random variable” So is really not random, since So(w) = 4 for all 
w € QQ. Nonetheless, it is a function mapping Q into JR, and thus technically a random variable, 


albeit a degenerate one. 


A random variable maps 2 into JR, and we can look at the preimage under the random variable of 
sets in JR. Consider, for example, the random variable Sz of Example 1.1. We have 


S(HHH) = S:(HHT) = 16, 
S.(HTH) = S)(HTT) = S:(THH) = $)(THT) =4, 
S.(TTH) = $:(TTT) = 1. 


Let us consider the interval [4, 27]. The preimage under $4 of this interval is defined to be 
{w € 0; So(w) € [4, 27]} = {w E34 < Sg < 27} = APr. 
The complete list of subsets of 2. we can get as preimages of sets in JR is: 
0,0, Ann, Ant U Ara, ArT, 


and sets which can be built by taking unions of these. This collection of sets is a c-algebra, called 
the a-algebra generated by the random variable S2, and is denoted by o(.S2). The information 
content of this o-algebra is exactly the information learned by observing 52. More specifically, 
suppose the coin is tossed three times and you do not know the outcome w, but someone is willing 
to tell you, for each set in o(.S2), whether w is in the set. You might be told, for example, that w is 
notin Ayyy, is in Ayr U Ara, and is not in Ay. Then you know that in the first two tosses, there 
was a head and a tail, and you know nothing more. This information is the same you would have 
gotten by being told that the value of S2(w) is 4. 


Note that F2 defined earlier contains all the sets which are in o(.S2), and even more. This means 
that the information in the first two tosses is greater than the information in 5». In particular, if you 
see the first two tosses, you can distinguish A7;7 from A7y;;, but you cannot make this distinction 
from knowing the value of Sy alone. 


20 


Definition 1.5 Let Q be a nonemtpy finite set and let ¥ be the o-algebra of all subsets of 22. Let _X 
be a random variable on (Q, F). The o-algebra o(.X ) generated by X is defined to be the collection 
of all sets of the form {w € Q; X (w) € A}, where A is a subset of JR. Let G be a sub-c-algebra of 
F. We say that X is G-measurable if every set in o(X) is also in G. 


Note: We normally write simply {X € A} rather than {w € Q; X(w) € A}. 


Definition 1.6 Let 2 be a nonempty, finite set, let F be the o-algebra of all subsets of 2, let IP be 
a probabilty measure on (Q, ¥), and let X be a random variable on 2. Given any set A C IR, we 
define the induced measure of A to be 


Ly (A) 2 IP{X € A}. 


In other words, the induced measure of a set A tells us the probability that X takes a value in A. In 
the case of Sz above with the probability measure of Example 1.2, some sets in JR and their induced 
measures are: 


Ls,(0) = PO) =0, 
£s, (IR) = P(Q) = 1, 
£s,{0,00) = P(Q) = 1, 


Poa = Pirie = ey 


2 
In fact, the induced measure of Sz places a mass of size (4) = 4 at the number 16, a mass of size 


2 
4 at the number 4, and a mass of size (3) = 4 at the number 1. A common way to record this 


information is to give the cumulative distribution function F's, (x) of Sz, defined by 


ife <1, 
ifl<a¢<4, 
if4<a< 16, 
if 16 <x. 


Fs, (x) 2 IP(S; <2) = (2.3) 


r= coloowo|4 


By the distribution of a random variable X, we mean any of the several ways of characterizing 
Lx. If X is discrete, as in the case of Sy above, we can either tell where the masses are and how 
large they are, or tell what the cumulative distribution function is. (Later we will consider random 
variables X which have densities, in which case the induced measure of a set A C JR is the integral 
of the density over the set A.) 


Important Note. In order to work through the concept of a risk-neutral measure, we set up the 
definitions to make a clear distinction between random variables and their distributions. 


A random variable is a mapping from 22 to JR, nothing more. It has an existence quite apart from 


discussion of probabilities. For example, in the discussion above, S2(77H) = S:(TTT) = 1, 


regardless of whether the probability for H is 4 or t. 


CHAPTER 1. Introduction to Probability Theory 21 


The distribution of a random variable is a measure £y on JR, i.e., a way of assigning probabilities 
to sets in JR. It depends on the random variable X and the probability measure /P we use in 2. If we 
set the probability of H to be 4, then Ls, assigns mass 4 to the number 16. If we set the probability 
of H to be $, then £5, assigns mass + to the number 16. The distribution of Sz has changed, but 
the random variable has not. It is still defined by 


S.(HHH) = S:(HHT) = 16, 
S.(HTH) = S)(HTT) = S:(THH) = S)(THT) =4, 
S.(TTH) = $:(TTT) = 1. 


Thus, a random variable can have more than one distribution (a “market” or “objective” distribution, 
and a “risk-neutral” distribution). 


In a similar vein, two different random variables can have the same distribution. Suppose in the 
binomial model of Example 1.1, the probability of H and the probability of 7 is $. Consider a 
European call with strike price 14 expiring at time 2. The payoff of the call at time 2 is the random 
variable (S52 — 14)+, which takes the value 2 if w = HHH orw = HHT, and takes the value 0 in 
every other case. The probability the payoff is 2 is i, and the probability it is zero is 3. Consider also 
a European put with strike price 3 expiring at time 2. The payoff of the put at time 2 is (3 — S2)*, 
which takes the value 2 if w = TTH orw = TTT. Like the payoff of the call, the payoff of the 
put is 2 with probability + and 0 with probability 3. The payoffs of the call and the put are different 


random variables having the same distribution. 


Definition 1.7 Let Q be a nonempty, finite set, let F be the o-algebra of all subsets of 2, let IP be 
a probabilty measure on (2, F), and let X be a random variable on 2. The expected value of X is 
defined to be 


EX 2 S~ X(v) Plo}. (2.4) 
wEQ 


Notice that the expected value in (2.4) is defined to be a sum over the sample space Q. Since 2 is a 
finite set, X can take only finitely many values, which we label 7,...,2,. We can partition 2 into 
the subsets {X,; = 2,},...,{X, = x}, and then rewrite (2.4) as 
EX 2 >> X(w)P{o} 
weEQ 


ak ISS piel 


k=1 we{X,=xrz } 


= a oa P{w} 


k=1 we{X,=x, } 


= 3 pIP{X;, = tz} 


k=1 
= > CeEx{ex}: 
k=1 


22 


Thus, although the expected value is defined as a sum over the sample space 2, we can also write it 
as a sum over JR. 


To make the above set of equations absolutely clear, we consider Sz with the distribution given by 
(2.3). The definition of JES is 


IES, = So(HHH)IP{HHH}+ S:(HHT)P{HHT} 
+S)(HTH)IP{HTH}4+ S.(HTT)P{HTT} 
+52(THH)IP{THH}4+ So(THT)P{THT} 
+52(TTH)P{TTH} 4 So(TTT)P{TTT} 

= 16-P(Aqy)+4-P(AnrU Ary) 4+1-P(Arr) 
= 16-P{S, = 16}+4-P{S,=4}41-P{S, =1} 
= 16-L5,{16}+4-Le {4$+1-L5,{1} 


= 162 iod dia 
= 9 9 9 
_ 48 
9 


Definition 1.8 Let 2. be a nonempty, finite set, let F be the o-algebra of all subsets of 2, let IP be a 
probabilty measure on (Q, F), and let X be a random variable on 2. The variance of X is defined 
to be the expected value of (X — IEX)?,i-e 


Var(X) 2 $7 (X(w) — EX)2P{o}. (2.5) 
weEQ 


One again, we can rewrite (2.5) as a sum over JR rather than over 2. Indeed, if X takes the values 
Y1,.-+,Hny, then 


= So (2, - EX) P{X = a4} = 90 (e4 — EX)?Lx (ax). 
k=1 k=1 


1.3. Lebesgue Measure and the Lebesgue Integral 


In this section, we consider the set of real numbers JR, which is uncountably infinite. We define the 
Lebesgue measure of intervals in JR to be their length. This definition and the properties of measure 
determine the Lebesgue measure of many, but not all, subsets of JR. The collection of subsets of 
JR we consider, and for which Lebesgue measure is defined, is the collection of Borel sets defined 
below. 


We use Lebesgue measure to construct the Lebesgue integral, a generalization of the Riemann 
integral. We need this integral because, unlike the Riemann integral, it can be defined on abstract 
spaces, such as the space of infinite sequences of coin tosses or the space of paths of Brownian 
motion. This section concerns the Lebesgue integral on the space JR only; the generalization to 
other spaces will be given later. 


CHAPTER 1. Introduction to Probability Theory 23 


Definition 1.9 The Borel o-algebra, denoted B(JR), is the smallest c-algebra containing all open 
intervals in JR. The sets in B(JR) are called Borel sets. 


Every set which can be written down and just about every set imaginable is in B(JR). The following 
discussion of this fact uses the c-algebra properties developed in Problem 1.3. 


By definition, every open interval (a, b) is in BUR), where a and b are real numbers. Since B(JR) is 
a o-algebra, every union of open intervals is also in BUR). For example, for every real number a, 
the open half-line 


(a,0oo) = Wee 


is a Borel set, as is 


For real numbers «a and 8, the union 
(—oo, a) U (8, 00) 
is Borel. Since B(JR) is a o-algebra, every complement of a Borel set is Borel, so B(JR) contains 
[a,b] = ((—00, a) U (b,00)) 


This shows that every closed interval is Borel. In addition, the closed half-lines 
[a,00) = [J[aat nl] 


and 


are Borel. Half-open and half-closed intervals are also Borel, since they can be written as intersec- 
tions of open half-lines and closed half-lines. For example, 


(a, b] = (—co, 8] N (a, oo). 


Every set which contains only one real number is Borel. Indeed, if a is a real number, then 


This means that every set containing finitely many real numbers is Borel; if A = {a 1, a2,..., an}, 
then 


AS U {a;}. 


k=1 


24 


In fact, every set containing countably infinitely many numbers is Borel; if A = {a1, a2,...}, then 


nh 
A= UJ {a;}. 
k=1 
This means that the set of rational numbers is Borel, as is its complement, the set of irrational 


numbers. 


There are, however, sets which are not Borel. We have just seen that any non-Borel set must have 
uncountably many points. 


Example 1.4 (The Cantor set.) This example gives a hint of how complicated a Borel set can be. 
We use it later when we discuss the sample space for an infinite sequence of coin tosses. 


Consider the unit interval [0, 1], and remove the middle half, i.e., remove the open interval 
1 3 
AS (3. 5) 
4° 4 


a= eles 


has two pieces. From each of these pieces, remove the middle half, i.e., remove the open set 
afl 3 13 15 
Az =(—,— ean ee 
: (seis) UG) 
1 3 1 3 13 15 
= ro Sa eas —,l}. 
C2 oa] Ulp aU Gauls 


has four pieces. Continue this process, so at stage k, the set C, has 2° pieces, and each piece has 
length mt The Cantor set 


The remaining set 


The remaining set 


A CO 
ee ae: 
k=1 
is defined to be the set of points not removed at any stage of this nonterminating process. 
Note that the length of A, the first set removed, is i. The “length” of Az, the second set removed, 
is $+ 4% = 4. The “length” of the next set removed is 4. + = $, and in general, the length of the 
k-th set removed is 2—". Thus, the total length removed is 
rae 
ok? 
k=1 2 
and so the Cantor set, the set of points not removed, has zero “length.” 


Despite the fact that the Cantor set has no “length,” there are lots of points in this set. In particular, 
none of the endpoints of the pieces of the sets C,, C2, ... is ever removed. Thus, the points 
1 3 1 3 13 15 1 
0 1L—=— 


are allin C. This is a countably infinite set of points. We shall see eventually that the Cantor set 
has uncountably many points. © 


CHAPTER 1. Introduction to Probability Theory 25 


Definition 1.10 Let BIR) be the c-algebra of Borel subsets of JR. A measure on (IR, B(IR)) is a 
function ,« mapping 8 into [0, co] with the following properties: 


(i) n(9) = 9, 


(ii) If Ay, Ag,... is a sequence of disjoint sets in B(JR), then 
ll (U 4x) = S- u(Ag). 
Lebesgue measure is defined to be the measure on (JR, B(JR)) which assigns the measure of each 


interval to be its length. Following Williams’s book, we denote Lebesgue measure by [ig. 


A measure has all the properties of a probability measure given in Problem 1.4, except that the total 
measure of the space is not necessarily 1 (in fact, 4o(JR) = oo), one no longer has the equation 


w(A°) = 1 = p(A) 
in Problem 1.4(iii), and property (v) in Problem 1.4 needs to be modified to say: 


(v) If Ai, Ao,... is a sequence of sets in BUR) with Ay D Ag D--- and fi(A1) < oo, then 
l (N 4s) = lim, (An): 
k=1 
To see that the additional requirment j1(.A,) < oo is needed in (v), consider 
Ay = [ie 00), Ag = (2, 00), Ag = [3, 00), sae 


Then N72, Ax = 9, 80 wo(NZZ, Ax) = 0, but limyoo Ho(An) = oo. 


We specify that the Lebesgue measure of each interval is its length, and that determines the Lebesgue 
measure of all other Borel sets. For example, the Lebesgue measure of the Cantor set in Example 
1.4 must be zero, because of the “length” computation given at the end of that example. 


The Lebesgue measure of a set containing only one point must be zero. In fact, since 
1 1 
a a = 
{a} ¢ (a-—,0+=) 
for every positive integer n, we must have 
1 1 2 
0 < po{a} < po (a-—,a+—) == 
n n n 


Letting n — ov, we obtain 
Hota} = 0. 


26 
The Lebesgue measure of a set containing countably many points must also be zero. Indeed, if 
A= {ay, a2,.-. ne then 

Ho(A) = S> pof{ax} = 550 =0. 


The Lebesgue measure of a set containing uncountably many points can be either zero, positive and 
finite, or infinite. We may not compute the Lebesgue measure of an uncountable set by adding up 
the Lebesgue measure of its individual members, because there is no way to add up uncountably 
many numbers. The integral was invented to get around this problem. 


In order to think about Lebesgue integrals, we must first consider the functions to be integrated. 


Definition 1.11 Let f be a function from WR to RR. We say that f is Borel-measurable if the set 
{a € IR; f(x) € A} is in BUR) whenever A € B(JR). In the language of Section 2, we want the 
o-algebra generated by f to be contained in B(UJR). 


Definition 3.4 is purely technical and has nothing to do with keeping track of information. It is 
difficult to conceive of a function which is not Borel-measurable, and we shall pretend such func- 
tions don’t exist. Hencefore, “function mapping JR to JR” will mean “Borel-measurable function 
mapping JR to JR” and “subset of JR” will mean “Borel subset of JR”. 


Definition 1.12 An indicator function g from JR to JR is a function which takes only the values 0 
and 1. We call 
A= {2 € Rig(e) =1} 


the set indicated by g. We define the Lebesgue integral of g to be 
A 
| g Upto = Ho(A). 
R 
A simple function h from JR to IR is a linear combination of indicators, i.e., a function of the form 
A(a) = = cCrgk (2), 
k=1 
where each g; is of the form 


(2) = 1, ifaw € Ag, 
EE ins Gas a A, 


and each c;, is areal number. We define the Lebesgue integral of h to be 
A n n 
| h duo = » cxf ged iig = SS CkHo( Ag). 
R k=1 R k=1 


Let f be a nonnegative function defined on JR, possibly taking the value oo at some points. We 
define the Lebesgue integral of f to be 


f f duo 2 sup { f h duo; h is simple and h(x) < f(x) for every x € in} : 
R R 


CHAPTER 1. Introduction to Probability Theory 27 


It is possible that this integral is infinite. If it is finite, we say that f is integrable. 


Finally, let f be a function defined on JR, possibly taking the value oo at some points and the value 
—oo at other points. We define the positive and negative parts of f to be 


ft(c) 2 max{ f(x), 0}, f7 (x) 2 max{-— f(z), 0}, 


respectively, and we define the Lebesgue integral of f to be 


[fa ® fF duo = fF dy 


provided the right-hand side is not of the form 00 — oo. If both fp ft dito and fap f~ dito are finite 
(or equivalently, fy | f| dj1o < co, since |f| = ft + f—), we say that f is integrable. 


Let f be a function defined on JR, possibly taking the value oo at some points and the value —oo at 
other points. Let A be a subset of IR. We define 


J feu ® [taf dno, 
A R 


AJ 1, if@eA, 
no? { 0, if2¢ A, 


where 


is the indicator function of A. 


The Lebesgue integral just defined is related to the Riemann integral in one very important way: if 
the Riemann integral [ : f(«)dz is defined, then the Lebesgue integral Sia] f disp agrees with the 
Riemann integral. The Lebesgue integral has two important advantages over the Riemann integral. 
The first is that the Lebesgue integral is defined for more functions, as we show in the following 
examples. 


Example 1.5 Let Q be the set of rational numbers in [0, 1], and consider f 4 Ig. Being acountable 
set, Q has Lebesgue measure zero, and so the Lebesgue integral of f over [0, 1] is 


| fac =0. 

[0,1] 

To compute the Riemann integral te f(a)dz, we choose partition points 0 = a < 4 < ++: < 
&,, = 1 and divide the interval [0, 1] into subintervals [%o, 71], [@1, %2],.--,[@n—1,%»]. In each 


subinterval [x;,_1, 7;] there is a rational point q;,, where f(q;,) = 1, and there is also an irrational 
point r;,, where f(r) = 0. We approximate the Riemann integral from above by the upper sum 


Mea 


F (ge) (#e — @e-1) = 2 l(a = gpa) = 1) 
k=l 


ca 
Il 
an 


and we also approximate it from below by the lower sum 


Ia 
Me 


F (rk) (@e — te-1) = ) 0+ (@e — @e-1) = 0. 


ca 
Il 
e 


k 


Il 
a 


28 


No matter how fine we take the partition of [0, 1], the upper sum is always | and the lower sum is 
always 0. Since these two do not converge to a common value as the partition becomes finer, the 
Riemann integral is not defined. © 


Example 1.6 Consider the function 


AJ oo, ife=0, 
fC ae 


This is not a simple function because simple function cannot take the value oo. Every simple 
function which lies between 0 and f is of the form 


Ah oy ie 0, 
FO ee ek 


for some y € [0, 00), and thus has Lebesgue integral 


| h dpio = ypo{O} = 0. 
R 
It follows that 


i, f duo = sup i. h duo; h is simple and h(x) < f(x) for every x € in} =), 
R R 


Now consider the Riemann integral f°. f(a) dx, which for this function f is the same as the 
Riemann integral f', f(«) de. When we partition [—1, 1] into subintervals, one of these will contain 


the point 0, and when we compute the upper approximating sum for fi f(a) dz, this point will 
contribute oo times the length of the subinterval containing it. Thus the upper approximating sum is 
oo. On the other hand, the lower approximating sum is 0, and again the Riemann integral does not 
exist. © 


The Lebesgue integral has all linearity and comparison properties one would expect of an integral. 
In particular, for any two functions f and g and any real constant c, 


[EF +9) dn = [fot f gdno, 
[cfd = 6 f Fat 


and whenever f(a) < g(x) for all x € JR, we have 


[fas | sdduo. 
R R 


Finally, if A and B are disjoint sets, then 


| fevo= fi fanot fi Fano. 


CHAPTER 1. Introduction to Probability Theory 29 


There are three convergence theorems satisfied by the Lebesgue integral. In each of these the sit- 
uation is that there is a sequence of functions f,,,n = 1,2,... converging pointwise to a limiting 
function f. Pointwise convergence just means that 

li — for ever : 

jim fn(#) = F(«) yreR 
There are no such theorems for the Riemann integral, because the Riemann integral of the limit- 
ing function f is too often not defined. Before we state the theorems, we given two examples of 
pointwise convergence which arise in probability theory. 


Example 1.7 Consider a sequence of normal densities, each with variance 1 and the n-th having 
mean 7: 


These converge pointwise to the function 
f(a) = 0 for every x € IR. 
We have fp frdtio = 1 for every n, solimn+co fp fndpo = 1, but fp f dpio = 0. © 


Example 1.8 Consider a sequence of normal densities, each with mean 0 and the n-th having vari- 


ance L 7 
n 


These converge pointwise to the function 


AJ ow, ife=0, 
ray 8 0, ifa 40. 


We have again fp fndjio = 1 for every n, so limy +o. fp fnduo = 1, but fp f duo = 0. The 
function f is not the Dirac delta; the Lebesgue integral of this function was already seen in Example 


1.6 to be zero. © 


Theorem 3.1 (Fatou’s Lemma) Let f,,n = 1,2,... be a sequence of nonnegative functions con- 
verging pointwise to a function f. Then 


| f dito < lim int [ tn dpo.- 
R OO R 


If limp—+co fqp fn dio is defined, then Fatou’s Lemma has the simpler conclusion 


J fauo < lim i fn Upto. 
R ROO R 


This is the case in Examples 1.7 and 1.8, where 


Jim, [fo duo = 1, 


30 


while fi f dito = 0. We could modify either Example 1.7 or 1.8 by setting g,, = f, if n is even, 
but g, = 2/f, if n is odd. Now [ip gn duo = 1 if n is even, but fap gn djio = 2 if n is odd. The 
sequence {7p Jn dito}, has two cluster points, 1 and 2. By definition, the smaller one, 1, is 
lim infp+co fiz Jn dfto and the larger one, 2, is lim sup,_... faz Jn dto. Fatou’s Lemma guarantees 
that even the smaller cluster point will be greater than or equal to the integral of the limiting function. 


The key assumption in Fatou’s Lemma is that all the functions take only nonnegative values. Fatou’s 
Lemma does not assume much but it is is not very satisfying because it does not conclude that 


J ft = im, f fu dy: 
R TL OO R 
There are two sets of assumptions which permit this stronger conclusion. 


Theorem 3.2 (Monotone Convergence Theorem) Let f,,n = 1,2,... be a sequence of functions 
converging pointwise to a function f. Assume that 


0< file) s fole) < fale) <*> for every & € IR. 


Then 
J ft = lim, f fu dy 
R OO R 


where both sides are allowed to be ox. 


Theorem 3.3 (Dominated Convergence Theorem) Let f,,,n = 1,2,... be a sequence of functions, 
which may take either positive or negative values, converging pointwise to a function f. Assume 
that there is a nonnegative integrable function g (i.e., {yg djio < 00) such that 


|fn(a)| < g(x) for every x € IR for every n. 


Then 
J fatto = lim, f fu dy 
R 1 0O R 
and both sides will be finite. 


1.4 General Probability Spaces 


Definition 1.13 A probability space (Q, F , IP) consists of three objects: 
(i) 2, a nonempty set, called the sample space, which contains all possible outcomes of some 
random experiment; 


(ii) 7, a o-algebra of subsets of Q; 


(iii) IP, a probability measure on (Q, F),i.e., a function which assigns to each set A € F a number 
IP(A) € [0,1], which represents the probability that the outcome of the random experiment 
lies in the set A. 


CHAPTER 1. Introduction to Probability Theory 31 


Remark 1.1 We recall from Homework Problem 1.4 that a probability measure /P has the following 
properties: 


(a) IP(0) = 0. 


(b) (Countable additivity) If A,, Az,... is a sequence of disjoint sets in F, then 
P (U 4s) = S$” P(Ax). 


(c) (Finite additivity) If n is a positive integer and A,,..., A,, are disjoint sets in ¥, then 


P(A, U---UA,) = IP(A1) +--+ + P(A, ). 


(d) If A and B are sets in F and A C B, then 
IP(B) = P(A)+ P(B\ A). 


In particular, 
IP(B) > P(A). 


(d) (Continuity from below.) If A;, A2,... is a sequence of sets in F with Ay C Ap C---, then 
P (U 4x) = lim (An): 
k=1 
(d) (Continuity from above.) If A;, A2,... is a sequence of sets in F with Ay D Ap D---, then 


P (i 4s) = lim P(A). 


k=1 


We have already seen some examples of finite probability spaces. We repeat these and give some 
examples of infinite probability spaces as well. 


Example 1.9 Finite coin toss space. 

Toss a coin n times, so that 2 is the set of all sequences of 7 and 7 which have n components. 
We will use this space quite a bit, and so give it a name: 22,,. Let F be the collection of all subsets 
of 2,,. Suppose the probability of H on each toss is p, a number between zero and one. Then the 


probability of T is ¢ ee p. For each w = (wi, W2,...,Wn) in Q,, we define 


Pio} A gp tee of Hinw | goals of Tin we 
For each A € F, we define 


P(A) 2 > Pio}. (4.1) 


weA 


We can define P(A) this way because A has only finitely many elements, and so only finitely many 
terms appear in the sum on the right-hand side of (4.1). © 


32 


Example 1.10 Infinite coin toss space. 

Toss a coin repeatedly without stopping, so that 2 is the set of all nonterminating sequences of 1 
and 7’. We call this space 2... This is an uncountably infinite space, and we need to exercise some 
care in the construction of the o-algebra we will use here. 


For each positive integer n, we define F,, to be the a-algebra determined by the first n tosses. For 
example, F2 contains four basic sets, 


A 

Ann = {w= (w1,W2,w3,...);w1 = H,w2, = A} 
= The set of all sequences which begin with HH, 
A 

Ant = {w= (w1,W2,3,...)3@1 = H,w2 =T} 
= The set of all sequences which begin with HT, 
A 

Ary = {w= (w1,W2,W3,...)3o1, =T,w. = H} 
= The set of all sequences which begin with TH, 
A 

Agr = {w= (64, 02,05,;<..);o; = Toe =T} 


The set of all sequences which begin with TT. 
Because F2 is a a-algebra, we must also put into it the sets @, 2, and all unions of the four basic 
sets. 


In the a-algebra F, we put every set in every o-algebra 7,,, where m ranges over the positive 
integers. We also put in every other set which is required to make F be a o-algebra. For example, 
the set containing the single sequence 


{HHHHH..-} = {H onevery toss} 


is not in any of the 7,, a-algebras, because it depends on all the components of the sequence and 
not just the first n components. However, for each positive integer n, the set 


{H on the first n tosses} 


is in F,, and hence in ¥. Therefore, 


{H on every toss} = () {H on the first n tosses} 
n=l 
is alsoin F. 


We next construct the probability measure JP on (Q2..,,) which corresponds to probability p € 
(0, 1] for H and probability g = 1 — p for I’. Let A € F be given. If there is a positive integer n 
such that A € F,,, then the description of A depends on only the first n tosses, and it is clear how to 
define P(A). For example, suppose A = Ayy U Aru, where these sets were defined earlier. Then 
A is in F. We set P(Ayp) = p? and P(Arr) = gp, and then we have 


IP(A) = IP(Ayy U Aru) =p’+qp= (p+ q)p = Pp. 


In other words, the probability of a H on the second toss is p. 


CHAPTER 1. Introduction to Probability Theory 33 


Let us now consider a set A € ¥ for which there is no positive integer n such that A € F. Such 
is the case for the set {H on every toss}. To determine the probability of these sets, we write them 
in terms of sets which are in ¥,, for positive integers n, and then use the properties of probability 
measures listed in Remark 1.1. For example, 


{H on the first toss} > {H on the first two tosses} 
> {H on the first three tosses} 
» ee 
and 
() {H on the first n tosses} = {H on every toss}. 
n=l 


According to Remark 1.1(d) (continuity from above), 
IP{H on every toss} = lim IP{H on the first n tosses} = lim p”. 


If p = 1, then IP{H onevery toss} = 1; otherwise, IP{H on every toss} = 0. 


A similar argument shows that if 0 < p < 1 sothat0 < q < 1, then every set in 2... which contains 
only one element (nonterminating sequence of H and 7’) has probability zero, and hence very set 
which contains countably many elements also has probabiliy zero. We are in a case very similar to 
Lebesgue measure: every point has measure zero, but sets can have positive measure. Of course, 
the only sets which can have positive probabilty in 2. are those which contain uncountably many 
elements. 


In the infinite coin toss space, we define a sequence of random variables Y;, Y2,... by 


afl ifw,=H, 
~) 0 ifu,=T, 


and we also define the random variable 


X(2) = He) 
k=1 


Since each Y; is either zero or one, X takes values in the interval [0, 1]. Indeed, X (777TT---) = 0, 
X(HHHH#H.---) = | and the other values of X lie in between. We define a “dyadic rational 
number” to be a number of the form 3;, where & and m are integers. For example, 3 is a dyadic 
rational. Every dyadic rational in (0,1) corresponds to two sequences w € (22... For example, 


3 
X(HHTTTTT..-) = X(HTHHHHH.---) = 5. 


The numbers in (0,1) which are not dyadic rationals correspond to a single w € Q,,.; these numbers 
have a unique binary expansion. 


34 


Whenever we place a probability measure JP on (Q, F), we have a corresponding induced measure 
Lx on[0, 1]. For example, if we set p = g = 4 in the construction of this example, then we have 


1 af 
LX 0 ;| = IP{First toss is 7} = a 
i ; , t 
Lx E | = IP{First toss is H} = 5 
1 ; 1 
Lx 0 ;| = IP{First two tosses are TT} = a 
Cal . 1 
bx Iz. ;| = IP{First two tosses are TH} = -, 
4°32 A 
1 3 : 1 
LX E ;| = IP{First two tosses are HT} = -, 
274 rl 
3 ; 1 
Lx Ts lane IP{First two tosses are HH} = i: 


Continuing this process, we can verify that for any positive integers k and m satisfying 


a Soe ahi 

we have 
m—-1lm 1 
Ex Po aes 


In other words, the £ x -measure of all intervals in [0, 1] whose endpoints are dyadic rationals is the 
same as the Lebesgue measure of these intervals. The only way this can be is for £x to be Lebesgue 
measure. 


It is interesing to consider what £.x would look like if we take a value of p other than $ when we 
construct the probability measure /P on 2. 


We conclude this example with another look at the Cantor set of Example 3.2. Let Qpairs be the 
subset of 2 in which every even-numbered toss is the same as the odd-numbered toss immediately 
preceding it. For example, H HTT TT H H is the beginning of a sequence in Q,a;-s, but HT is not. 
Consider now the set of real numbers 


CO 2A ys EO’ 


The numbers between (4, $) can be written as X (w), but the sequence w must begin with either 


TH or HT. Therefore, none of these numbers is in C’. Similarly, the numbers between (+, =) 
can be written as X (w), but the sequence w must begin with T7'7T'H or TT HT, so none of these 
numbers is in C’. Continuing this process, we see that C’ will not contain any of the numbers which 
were removed in the construction of the Cantor set C’ in Example 3.2. In other words, C’ C C. 
With a bit more work, one can convince onself that in fact C’ = C, i.e., by requiring consecutive 
coin tosses to be paired, we are removing exactly those points in [0, 1] which were removed in the 


Cantor set construction of Example 3.2. © 


CHAPTER 1. Introduction to Probability Theory 35 


In addition to tossing a coin, another common random experiment is to pick a number, perhaps 
using a random number generator. Here are some probability spaces which correspond to different 
ways of picking a number at random. 


Example 1.11 

Suppose we choose a number from /F in such a way that we are sure to get either 1, 4 or 16. 
Furthermore, we construct the experiment so that the probability of getting 1 is 4, the probability of 
getting 4 is 4 and the probability of getting 16 is 4. We describe this random experiment by taking 
Q to be IR, F to be B(JR), and setting up the probability measure so that 


Phij= PU} = > P{16} = > 


This determines /P(.A) for every set A € BUR). For example, the probability of the interval (0, 5] 
is 5. because this interval contains the numbers | and 4, but not the number 16. 


The probability measure described in this example is £5, , the measure induced by the stock price 
2, when the initial stock price Sg = 4 and the probability of H is 4. This distribution was discussed 
immediately following Definition 2.8. © 


Example 1.12 Uniform distribution on [0, 1]. 

Let 2 = [0,1] and let F = B([0, 1]), the collection of all Borel subsets containined in [0, 1]. For 
each Borel set A C [0, 1], we define P(A) = juo(A) to be the Lebesgue measure of the set. Because 
fio[0, 1] = 1, this gives us a probability measure. 


This probability space corresponds to the random experiment of choosing a number from [0, 1] so 
that every number is “equally likely” to be chosen. Since there are infinitely mean numbers in [0, 1], 
this requires that every number have probabilty zero of being chosen. Nonetheless, we can speak of 
the probability that the number chosen lies in a particular set, and if the set has uncountably many 
points, then this probability can be positive. © 


I know of no way to design a physical experiment which corresponds to choosing a number at 
random from [0, 1] so that each number is equally likely to be chosen, just as I know of no way to 
toss a coin infinitely many times. Nonetheless, both Examples 1.10 and 1.12 provide probability 
spaces which are often useful approximations to reality. 


Example 1.13 Standard normal distribution. 
Define the standard normal density 


Ls) 


I xv 
o ae, 


Let 2 = JR, F = BUR) and for every Borel set A C JR, define 


P(A)? | gedno. (4.2) 


36 


If A in (4.2) is an interval [a, 5], then we can write (4.2) as the less mysterious Riemann integral: 


a2 


boy a 
P(a,0) 2 | Tan 2 dx. 


This corresponds to choosing a point at random on the real line, and every single point has probabil- 
ity zero of being chosen, but if a set A is given, then the probability the point is in that set is given 
by (4.2). © 


The construction of the integral in a general probability space follows the same steps as the con- 
struction of Lebesgue integral. We repeat this construction below. 


Definition 1.14 Let (Q, F, IP) be a probability space, and let X be a random variable on this space, 
i.e., a mapping from 2 to IR, possibly also taking the values -too. 


e If X is an indicator, 1.e, 


for some set A € F, we define 
F: XdP 4 P(A). 
a 


e If X is a simple function, 1.e, 
X(w) = PS cel a, (w), 
k=1 


where each c; is a real number and each A, is a set in ¥, we define 

| xdPp2> af Ta, UIP = S~ cx IP(Ax). 

Q k=1 Q k=1 

e If .X is nonnegative but otherwise general, we define 
i X dP 
a 
2 sup {{ Y dIP;Y is simple and Y (w) < X (w) for every w € a} : 
a 


In fact, we can always construct a sequence of simple functions Y,,,n = 1, 2,... such that 
0< Yi(w) < Yo(w) < Y3(w) < ... for every w € Q, 


and Y (w) = limn+oo Y,(w) for every w € Q. With this sequence, we can define 


pxaps it | Yoar. 
Q NCO Q 


CHAPTER 1. Introduction to Probability Theory 37 


e If X is integrable, i.e, 
[ xtap<o, [ xcaP <o, 
2 2 


where 
X+(w) 2 max{X(w),0},  X7(w) 2 max{—X(w), 0}, 


pxaps [ xtap-- | x-ap. 
Q Q Q 


If A is aset in F and X is arandom variable, we define 


then we define 


[ xaps [ta-xar. 
A Q 


The expectation of a random variable X is defined to be 


ex? | xa 
Q 


The above integral has all the linearity and comparison properties one would expect. In particular, 
if X and Y are random variables and c is a real constant, then 


pixtyvar = [xap+ [ yar, 
Q Q Q 


pexap = cf Xap, 
2 2 


If X(w) < Y(w) for every w € Q, then 


[xars [var 
Q Q 


In fact, we don’t need to have X (w) < Y (w) for every w € 2 in order to reach this conclusion; it is 
enough if the set of w for which X (w) < Y (w) has probability one. When a condition holds with 
probability one, we say it holds almost surely. Finally, if A and B are disjoint subsets of 2 and X 
is arandom variable, then 


/ xap= | xap+f xar 
AUB A B 


We restate the Lebesgue integral convergence theorem in this more general context. We acknowl- 
edge in these statements that conditions don’t need to hold for every w; almost surely is enough. 


Theorem 4.4 (Fatou’s Lemma) Let X,,,n = 1,2,... be a sequence of almost surely nonnegative 
random variables converging almost surely to a random variable X. Then 


| XdP< lim int f X, dP, 
Q ROO Q 


or equivalently, 
IEX < fim inf IEX,,. 


38 


Theorem 4.5 (Monotone Convergence Theorem) Let X,,,n = 1,2,... be a sequence of random 
variables converging almost surely to a random variable X. Assume that 


0< XX, < Xo < X3 <--- almost surely. 


Then 
[ xar= lim J xXnair 
Q NOOO Q 


IEX = lim FX, 
NCO 


or equivalently, 


Theorem 4.6 (Dominated Convergence Theorem) Let X,,,n = 1,2,... be a sequence of random 
variables, converging almost surely to a random variable X. Assume that there exists a random 
variable Y such that 

|X| < Y almost surely for every n. 


Then 
[ xar= lim [ xnar, 
Q NOOO Q 


IEX = lim FX,,. 
NCO 


or equivalently, 


In Example 1.13, we constructed a probability measure on (JR, B(JR)) by integrating the standard 
normal density. In fact, whenever ¢ is a nonnegative function defined on R satisfying fp ¢ dito = 1, 
we call y a density and we can define an associated probability measure by 


IP(A) = [ edn for every A € BUR). (4.3) 


We shall often have a situation in which two measure are related by an equation like (4.3). In fact, 
the market measure and the risk-neutral measures in financial markets are related this way. We say 
that ¢ in (4.3) is the Radon-Nikodym derivative of diP with respect to j19, and we write 


_ dP 


= 4.4 
Te (4.4) 


The probability measure /P weights different parts of the real line according to the density y. Now 
suppose f is a function on (#, BUR), IP). Definition 1.14 gives us a value for the abstract integral 


ie fake. 


i fe duo, 
R 


which is an integral with respec to Lebesgue measure over the real line. We want to show that 


We can also evaluate 


[far = [ feduo, (4.5) 


CHAPTER 1. Introduction to Probability Theory 39 


an equation which is suggested by the notation introduced in (4.4) (substitute ef for ¢ in (4.5) and 
“cancel” the djio). We include a proof of this because it allows us to illustrate the concept of the 
standard machine explained in Williams’s book in Section 5.12, page 5. 


The standard machine argument proceeds in four steps. 


Step 1. Assume that f is an indicator function, i.e., f(a“) = 14(x) for some Borel set A C JR. In 
that case, (4.5) becomes 


IP(A) = iy. p dito. 
This is true because it is the definition of P(A). 


Step 2. Now that we know that (4.5) holds when f is an indicator function, assume that f is a 
simple function, 1.e., a linear combination of indicator functions. In other words, 


f(a) = s cehp(a), 
pat 


where each c; is a real number and each fy, is an indicator function. Then 


[rae = [oot] aw 


k=1 


ho 
eis 
= = 

SS 
= = 
ca so 
6 i" 
= as 
j=) 


k=1 
= f, [oom p dio 
k=1 
= | fede 


Step 3. Now that we know that (4.5) holds when f is a simple function, we consider a general 
nonnegative function f. We can always construct a sequence of nonnegative simple functions 
fn, n= 1,2,... such that 


0< filt) < fo(z) < fa(z) <... foreveryz € R, 
and f(x) = limp. fn(#) for every z € JR. We have already proved that 


| par= | fn? duo for every n. 
R R 


We let 7 — oo and use the Monotone Convergence Theorem on both sides of this equality to 


get 
[far = [, feduo. 


40 


Step 4. In the last step, we consider an integrable function f, which can take both positive and 
negative values. By integrable, we mean that 


[ fap <o, [ fap <c. 
R R 


(From Step 3, we have 


[fae = [| teduo, 
[rae = [ redno. 


Subtracting these two equations, we obtain the desired result: 
[fap z a 
R R 


“y dito — | f pduo 
IR 


lI lI 
a 

a 

€ iar 

a 

= 

S 


1.5 Independence 


In this section, we define and discuss the notion of independence in a general probability space 
(Q, F, IP), although most of the examples we give will be for coin toss space. 


1.5.1 Independence of sets 


Definition 1.15 We say that two sets A € ¥ and B € F are independent if 
IP(AN B) = P(A)P(B). 


Suppose a random experiment is conducted, and w is the outcome. The probability that w € A is 
IP(A). Suppose you are not told w, but you are told that w € B. Conditional on this information, 
the probability that w € A is 

A P(ANB 

IP(A|B) = —P(B) 

The sets A and B are independent if and only if this conditional probability is the uncondidtional 
probability JP(A), i.e., knowing that w € B does not change the probability you assign to A. This 
discussion is symmetric with respect to A and B; if A and B are independent and you know that 
w € A, the conditional probability you assign to B is still the unconditional probability P(B). 


Whether two sets are independent depends on the probability measure JP. For example, suppose we 
toss a coin twice, with probability p for H and probability g = 1 — p for T on each toss. To avoid 
trivialities, we assume that 0 < p < 1. Then 


P{HH}= ", P{HT} = P{TH}=99, P{TT}=¢. (5.1) 


CHAPTER 1. Introduction to Probability Theory 41 


Let A= {HH, HT} and B = {HT,TH}. In words, A is the set “H on the first toss” and B is the 
set “one H and one 7.” Then AM B = {HT}. We compute 


These sets are independent if and only if 2p?g = pq, which is the case if and only if p = t. 


Ifp= s, then JP(B), the probability of one head and one tail, is i. If you are told that the coin 
tosses resulted in a head on the first toss, the probability of B, which is now the probability of a 7’ 
on the second toss, is still i. 


Suppose however that p = 0.01. By far the most likely outcome of the two coin tosses is TT’, and 
the probability of one head and one tail is quite small; in fact, JP(B) = 0.0198. However, if you 
are told that the first toss resulted in H, it becomes very likely that the two tosses result in one head 
and one tail. In fact, conditioned on getting a H on the first toss, the probability of one H and one 
T is the probability of a 7’ on the second toss, which is 0.99. 


1.5.2 Independence of o-algebras 


Definition 1.16 Let G and H be sub-c-algebras of 7. We say that G and H are independent if every 
set in G is independent of every set in 1, i.e, 


IP(AN B) = P(A) P(B) forevery ACH, BEG. 


Example 1.14 Toss a coin twice, and let IP be given by (5.1). Let G = Fy, be the o-algebra 
determined by the first toss: G contains the sets 


0,0,{HH,HT},{TH,TT}. 
Let H be the c-albegra determined by the second toss: 1 contains the sets 
0,0,{HH,THS,{HT,TT}. 


These two o-algebras are independent. For example, if we choose the set {H H, HT} from G and 
the set {H H, TH} from H, then we have 


IPL HH, HT} IP{HH,TH} = (p" + pq)(p? + pg) =P’, 
IP({HH, HT} {HH,TH}) = P{HH} =p. 


No matter which set we choose in G and which set we choose in H, we will find that the product of 
the probabilties is the probability of the intersection. 


42 


Example 1.14 illustrates the general principle that when the probability for a sequence of tosses is 
defined to be the product of the probabilities for the individual tosses of the sequence, then every 
set depending on a particular toss will be independent of every set depending on a different toss. 
We say that the different tosses are independent when we construct probabilities this way. It is also 
possible to construct probabilities such that the different tosses are not independent, as shown by 
the following example. 


Example 1.15 Define /P for the individual elements of Q = {HH, HT,TH,TT} to be 


1 

a 

and for every set A C Q, define P(A) to be the sum of the probabilities of the elements in A. Then 
IP(Q) = 1, so P is a probability measure. Note that the sets {H on first toss} = {H H, HT} and 
{H on second toss} = {HH,TH} have probabilities P{HH,HT} = 4 and P{HH,TH} = 
4, so the product of the probabilities is +. On the other hand, the intersection of {H H, HT} 
and {H H,TH} contains the single element {H H}, which has probability 3. These sets are not 
independent. 


P{HH} = > P{HT} = PITH} = * P{TT} = 


1.5.3 Independence of random variables 


Definition 1.17 We say that two random variables X and Y are independent if the c-algebras they 
generate o(X’) and o(Y’) are independent. 


In the probability space of three independent coin tosses, the price Sz of the stock at time 2 is 
independent of ne This is because $2 depends on only the first two coin tosses, whereas oe is 
either w or d, depending on whether the third coin toss is 7 or T. 


Definition 1.17 says that for independent random variables X and Y, every set defined in terms of 
X is independent of every set defined in terms of Y. In the case of Sz and a just considered, for ex- 


ample, the sets {52 = udSo} = {HTH, HTT} and {& =u} = {HHH,HTH,THH,TTH} 
are indepedent sets. 


Suppose X and Y are independent random variables. We defined earlier the measure induced by X 
on JR to be 
Lx (A) 2 P{X € A}, ACR. 


Similarly, the measure induced by Y is 
Ly(B) 2 P{Y € B}, BCR. 


Now the pair (X, Y) takes values in the plane JR’, and we can define the measure induced by the 
pair 

L£xy(C) — PL(X, ¥) EG CS IR’. 
The set C in this last equation is a subset of the plane JR”. In particular, C' could be a “rectangle”, 
i.e, a set of the form A x B, where A C WR and B C R. In this case, 


{(X,Y)€ Ax BY} ={X € A} N{Y € Bh, 


CHAPTER 1. Introduction to Probability Theory 43 


and X and Y are independent if and only if 


Lxy(AxB) IP({X € A}n{Y € B}) 

IP{X © A}IP{Y € B} (5.2) 
Lx (A)£Ly (B). 

In other words, for independent random variables X and Y, the joint distribution represented by the 


measure £y y factors into the product of the marginal distributions represented by the measures 
L£ a and Ly. 


A joint density for (X, Y) is a nonnegative function fx y (a, y) such that 


Lxy(Ax B)= [ [txxew dx dy. 


Not every pair of random variables (X,Y) has a joint density, but if a pair does, then the random 
variables X and Y have marginal densities defined by 


fx(e)= f° fxxleamdn, fry) [fxr uas. 


These have the properties 


L£x(A) 


i; Fee edie AC 
A 


Ly (B) [iw dio BOT: 


Suppose X and Y have a joint density. Then X and Y are independent variables if and only if 
the joint density is the product of the marginal densities. This follows from the fact that (5.2) is 
equivalent to independence of X and Y. Take A = (—oo, «] and B = (—ov, y], write (5.1) in terms 
of densities, and differentiate with respect to both a and y. 


Theorem 5.7 Suppose X and Y are independent random variables. Let g and h be functions from 
IR to IR. Then g(X) and h(Y) are also independent random variables. 


PROOF: Let us denote W = g(X) and Z = h(Y). We must consider sets in o(W) and o(Z). But 
a typical set in o(W) is of the form 


{w; Ww) € A} = {wi g(X(w)) € Af, 


which is defined in terms of the random variable X. Therefore, this set is in o(X). (In general, 
we have that every set in o(W) is also in o(.X), which means that X contains at least as much 
information as W. In fact, X can contain strictly more information than W, which means that o (X ) 
will contain all the sets in o(W) and others besides; this is the case, for example, if W = X ay 


In the same way that we just argued that every set in g(W) is also in o(.X), we can show that 
every set in o(Z) is also in o(Y). Since every set in o(.X’ ) is independent of every set in o(Y), we 
conclude that every set in o(W) is independent of every set in o(Z). © 


44 


Definition 1.18 Let X,, X2,... be a sequence of random variables. We say that these random 
variables are independent if for every sequence of sets Ay € o(X 1), Az € o(X2),... and for every 
positive integer n, 

IP( Ay Ag Q-++A,) = IP(A1)IP(Ag) ++ P(A,). 


1.5.4 Correlation and independence 


Theorem 5.8 If two random variables X and Y are independent, and if g and h are functions from 
IR to IR, then 
E[g(X)h(Y)] = Eg (X)- ERY), 


provided all the expectations are defined. 


PROOF: Let g(a) = L4(a) and h(y) = Ig(y) be indicator functions. Then the equation we are 
trying to prove becomes 


IP({X € A} {Y € B}) = P{X € APPLY € BY, 
which is true because X and Y are independent. Now use the standard machine to get the result for 


general functions g and h. © 


The variance of a random variable X is defined to be 
Var(X) 2 ELK -— EX]? 
The covariance of two random variables X and Y is defined to be 


Cov(X,Y) = B|(X — BX)(Y - BY)] 
= JIE|XY|-EX- EY. 
According to Theorem 5.8, for independent random variables, the covariance is zero. If X and Y 
both have positive variances, we define their correlation coefficient 


Cov(X, Y) 


A 
PO aay 


For independent random variables, the correlation coefficient is zero. 


Unfortunately, two random variables can have zero correlation and still not be independent. Con- 
sider the following example. 


Example 1.16 Let X be a standard normal random variable, let 7 be independent of X and have 
the distribution IP{Z = 1} = P{Z = —-1} = 0. Define Y = XZ. We show that Y is also a 
standard normal random variable, X and Y are uncorrelated, but X and Y are not independent. 


The last claim is easy to see. If X and Y were independent, so would be X 2 and Y?, but in fact, 
X? = Y? almost surely. 


CHAPTER 1. Introduction to Probability Theory 45 


We next check that Y is standard normal. For y € JR, we have 
IP4{Y <y} = PAY <yandZ=1}4 PLY <yandZ=-1} 
= IP{X <yandZ=1}4 P{-X <yandZ=-1} 
= P(X <yHP{Z=1}+ P{-X <y}P{Z=-} 


1 1 
= gPiX sy} t+ sPi-X sy}. 


Since X is standard normal, P{X < y} = IP{X < —y}, and we have P{Y < y} = IP{X < y}, 
which shows that Y is also standard normal. 


Being standard normal, both X and Y have expected value zero. Therefore, 
Cov(X, Y) = ELXY] = IE[X?Z] = EX? -EZ=1-0=0. 
Where in JR? does the measure £ yy put its mass, i.e., what is the distribution of (X,Y)? 


We conclude this section with the observation that for independent random variables, the variance 
of their sum is the sum of their variances. Indeed, if X and Y are independent and Z = X +Y, 
then 


|b 


Var(Z) IE |(Z — IEZ)?] 

= a EX — BY)?] 

= a )?+2(X — BX)(Y — BY) +(¥ — BY)? 
= Var(X)+ a IEX]E[Y — IEY] + Var(Y) 

= a + Var(Y’). 


This argument extends to any finite number of random variables. If we are given independent 
random variables X,, X2,...,X,, then 


Var(X1 + X2+-+++X,) = Var(X1) + Var(X2) + +--+ Var(X,). (5.3) 


1.5.5 Independence and conditional expectation. 


We now return to property (k) for conditional expectations, presented in the lecture dated October 
19, 1995. The property as stated there is taken from Williams’s book, page 88; we shall need only 
the second assertion of the property: 


(k) If arandom variable X is independent of a o-algebra H, then 


E[X|H] = 


The point of this statement is that if X is independent of 1, then the best estimate of X based on 
the information in 7 is JE_X,, the same as the best estimate of X based on no information. 


46 


To show this equality, we observe first that J/.X is 7{-measurable, since it is not random. We must 
also check the partial averaging property 


| wxap= | Xap torevery Ac H. 
A A 


If X is an indicator of some set B, which by assumption must be independent of #, then the partial 
averaging equation we must check is 


[Pw ap = | tp dP. 


The left-hand side of this equation is P(A) JP(B), and the right hand side is 


[tater = | tnd = (ANB), 
2 2 


The partial averaging equation holds because A and B are independent. The partial averaging 
equation for general X independent of H follows by the standard machine. 


1.5.6 Law of Large Numbers 


There are two fundamental theorems about sequences of independent random variables. Here is the 
first one. 


Theorem 5.9 (Law of Large Numbers) Let X,, X2,... be a sequence of independent, identically 
distributed random variables, each with expected value jt and variance o*. Define the sequence of 
averages 


i 5h 
nr 


Then Y,, converges to jt almost surely as n + ox. 


We are not going to give the proof of this theorem, but here is an argument which makes it plausible. 
We will use this argument later when developing stochastic calculus. The argument proceeds in two 
steps. We first check that JFY,, = ys for every n. We next check that Var(Y;,) + 0 as n > 0. In 
other words, the random variables Y,, are increasingly tightly distributed around pz: as n — oo. 


For the first step, we simply compute 
1 1 
IEY,, = EX + EX: + + EX] = 5 tet tH = fh 
———_—— 
n times 


For the second step, we first recall from (5.3) that the variance of the sum of independent random 
variables is the sum of their variances. Therefore, 


As n —> 00, we have Var(Y;,) + 0. 


CHAPTER 1. Introduction to Probability Theory 47 


1.5.7. Central Limit Theorem 


The Law of Large Numbers is a bit boring because the limit is nonrandom. This is because the 
denominator in the definition of Y;, is so large that the variance of Y,, converges to zero. If we want 
to prevent this, we should divide by \/n rather than n. In particular, if we again have a sequence of 


independent, identically distributed random variables, each with expected value j: and variance o?, 


but now we set 
A (Xi — 4) + (Xo— pw) +--+ (Xn - ) 


Zn 


then each Z,, has expected value zero and 


As n —> oo, the distributions of all the random variables 7,, have the same degree of tightness, as 
measured by their variance, around their expected value 0. The Central Limit Theorem asserts that 
as n — oo, the distribution of 7,, approaches that of a normal random variable with mean (expected 
value) zero and variance a”. In other words, for every set A C JR, 


1 x? 
; e 20 dz. 
oV2n JA 


jim, P{Zn € A} = 


48 


Chapter 2 


Conditional Expectation 


Please see Hull’s book (Section 9.6.) 


2.1 A Binomial Model for Stock Price Dynamics 


Stock prices are assumed to follow this simple binomial model: The initial stock price during the 
period under study is denoted So. At each time step, the stock price either goes up by a factor of u 
or down by a factor of d. It will be useful to visualize tossing a coin at each time step, and say that 


e the stock price moves up by a factor of u if the coin comes out heads (7), and 


e down by a factor of d if it comes out tails (7). 


Note that we are not specifying the probability of heads here. 


Consider a sequence of 3 tosses of the coin (See Fig. 2.1) The collection of all possible outcomes 
(i.e. sequences of tosses of length 3) is 


Q = {HHH, HHT, HTH, HTT, THH,THH,THT,TTH, TTT}. 


A typical sequence of 2 will be denoted w, and w; will denote the ‘th element in the sequence w. 
We write S),(w) to denote the stock price at “time” k (i.e. after k tosses) under the outcome w. Note 
that S;,(w) depends only on wy, w2,... , w%. Thus in the 3-coin-toss example we write for instance, 


S1(w) F Sy (w1, w2,03) = Si(w1), 


So(w) 2 So(w1, we, ws) 2 So(wi,w2). 


Each .$; is a random variable defined on the set 2. More precisely, let F = P(Q). Then F is a 
o-algebra and ((2, F) is a measurable space. Each S$; is an F-measurable function Q— JR, that is, 
Sy, ' is a function BF where B is the Borel -algebra on R. We will see later that 5; is in fact 


49 


50 


3 
yee unio 
9 
S5 (HH) =u*S 
y; 0 
=H a S3(HHT) = d So 


S3(HTH) =i d So 


5, (H) = uS) : 
S3(THH) = d So 
wer 
=H 
S2 (HT) = ud §) 
So 
S2 (TH) = ud §) 
0, =T 2 
=H S3(HTT)=d" u So 
2 
a S3 (THT) = d7u So 
S(T) = dS, 
S3(TTH) = d? u So 
ai - 
: - 
S, (TT) = 47S 
: Pee 


a; 53(TTT) = 4" S, 
Figure 2.1: A three coin period binomial model. 


measurable under a sub-c-algebra of ¥. Recall that the Borel c-algebra B is the o-algebra generated 
by the open intervals of R. In this course we will always deal with subsets of R that belong to B. 


For any random variable X defined on a sample space 2 and any y € JR, we will use the notation: 
A 
{X < y} = {we 9; Xe) < y}. 


The sets {X < y},{X > y}, {X = y}, etc, are defined similarly. Similarly for any subset B of JR, 
we define a 
{X € B} = {w € 0; X(w) € B}. 


Assumption 2.1 u > d > 0. 


2.2 Information 


Definition 2.1 (Sets determined by the first & tosses.) We say that a set A C 2 is determined by 
the first k coin tosses if, knowing only the outcome of the first & tosses, we can decide whether the 
outcome of all tosses is in A. In general we denote the collection of sets determined by the first & 
tosses by F;. It is easy to check that F;, is a c-algebra. 


Note that the random variable 5; is #;-measurable, for each k = 1,2,...,. 


Example 2.1 In the 3 coin-toss example, the collection *, of sets determined by the first toss consists of: 


CHAPTER 2. Conditional Expectation 51 


1. Ay 2 {HHH, HHT,HTH, HTT}, 
0. Ap SIT HA THT TIT), 
3. @, 

Aus 


The collection Fz of sets determined by the first two tosses consists of: 


i. Ane 2 (RR aT}, 
Ape = AATEC APT}, 
dey SAT TAT), 
Age SNOT RTP 


. The complements of the above sets, 


Any union of the above sets (including the complements), 
. @and 2. 


I AUwF WW 


Definition 2.2 (Information carried by a random variable.) Let X be a random variable 2— WR. 
We say that a set A C 22 is determined by the random variable X if, knowing only the value X (w) 
of the random variable, we can decide whether or notw € A. Another way of saying this is that for 
every y € IR, either X~'(y) C A or X~!(y) VA = 6. The collection of susbets of 2 determined 
by X is a o-algebra, which we call the o-algebra generated by X, and denote by o(X). 


If the random variable X takes finitely many different values, then o(X ) is generated by the collec- 
tion of sets 
{X7"(X (w))|w € 9}; 


these sets are called the atoms of the o-algebra o(.X). 


In general, if X is a random variable Q— IR, then o(X) is given by 
o(X) = {X71(B);B € B}. 
Example 2.2 (Sets determined by S.) The c-algebra generated by S2 consists of the following sets: 


. Ann = {HHA, HHT} = {w €Q; So(w) = u? So}, 
. Apr = {TTH,TTT} = {So = d?So}, 

Aut UAra = {Se = udSo}, 

. Complements of the above sets, 

. Any union of the above sets, 

- 6 = {S2(w) € >}, 

. Q= {So(w) € R}. 


YAMA WH = 


52 
2.3. Conditional Expectation 


In order to talk about conditional expectation, we need to introduce a probability measure on our 
coin-toss sample space {2. Let us define 


e p€ (0,1) is the probability of H, 

© ¢ = (1 —p) is the probability of T, 

e the coin tosses are independent, so that, e.g., IP(H HT) = p°q, etc. 
© P(A) 2D ye4 Pw), VAC Q. 


Definition 2.3 (Expectation.) 


EX = S> X(e) Pl). 


we 
If A Cc Q then 
AJ 1 ifweA 
Talw) fs ifw eA 
and 


We can think of (4X) as a partial average of X over the set A. 


2.3.1 An example 


Let us estimate $1, given S2. Denote the estimate by J/(.S;|S2). From elementary probability, 
JE'(.S,|.S2) is arandom variable Y whose value at w is defined by 


Y¥(w) = IE(Si|S2 = y), 
where y = S2(w). Properties of IE'(S1|.S2): 
e JE(5;|S2) should depend on w, i.e., it is a random variable. 


e If the value of Sz is known, then the value of JE'(.5;|.S2) should also be known. In particular, 


- Ifw = HHA orw = HAT, then S2(w) = u*So. If we know that $2(w) = u?So, then 
even without knowing w, we know that S;(w) = uSo. We define 


-Ifw=TTT orw =TTH, then $2(w) = d?So. If we know that S2(w) = d*,So, then 
even without knowing w, we know that S;(w) = d.So. We define 


E(S1|S2)(TTT) = E(81|S2)(TTH) = dSp. 


CHAPTER 2. Conditional Expectation 53 


-Ifwe A={HTH, HTT,THH,THT}, then $2(w) = udSpo. If we know S2(w) = 
udSo, then we do not know whether S$; = w5Sp or 5; = diSp. We then take a weighted 
average: 

P(A) = p’q+ pe +p + pe? = 2pq. 


Furthermore, 


i S;diP = p’quSo + pg wSo + p*qdSo + pq dSo 
= pq(ut+d)So 


For w € A we define 


_ f4SidIP 


Then 
[ eGilsar = [| Sd. 
A A 


In conclusion, we can write 
IE(S1|S2)(~) = g(S2(w)), 


where 
uSo ifz = u?So 
g(z@)=% Z(ut+d)So  ife =udSpo 
dSo if = d? So 


In other words, JE'(.S;|.S2) is random only through dependence on S2. We also write 
E(S\|S2 = 2) = g(x), 


where g is the function defined above. 


The random variable JE'(5;|.S2) has two fundamental properties: 


e JE(S,|S2) is o(S2)-measurable. 
e For every set A € o(S9), 
I IE(S\|S2)diP = / Si dP. 
A A 


2.3.2 Definition of Conditional Expectation 


Please see Williams, p.83. 


Let (Q, F, IP) be a probability space, and let G be a sub-o-algebra of F. Let X be a random variable 
on (Q, F, JP). Then I (X|G@) is defined to be any random variable Y that satisfies: 


(a) Y is G-measurable, 


54 


(b) For every set A € G, we have the “partial averaging property” 


i YdP = / XdP. 
A A 


Existence. There is always a random variable Y satisfying the above properties (provided that 
IE-|X| < 00), Le., conditional expectations always exist. 


Uniqueness. There can be more than one random variable Y satisfying the above properties, but if 
Y’ is another one, then Y = Y’ almost surely, i.e., P{w € Q;Y(w) = Y"(w)} = 1. 


Notation 2.1 For random variables X, Y, it is standard notation to write 
A 
IE(X|Y) = E(X|o(Y)). 
Here are some useful ways to think about IE (X |G): 


e A random experiment is performed, i.e., an element w of 2 is selected. The value of w is 
partially but not fully revealed to us, and thus we cannot compute the exact value of X (w). 
Based on what we know about w, we compute an estimate of X (w). Because this estimate 
depends on the partial information we have about w, it depends on w, i.e., E[X|Y](w) isa 
function of w, although the dependence on w is often not shown explicitly. 


e If the c-algebra G contains finitely many sets, there will be a “smallest” set A in G containing 
w, which is the intersection of all sets in G containing w. The way w is partially revealed to us 
is that we are told it is in A, but not told which element of A it is. We then define E[X|Y](w) 
to be the average (with respect to IP) value of X over this set A. Thus, for all w in this set A, 
IE|X|Y](w) will be the same. 
2.3.3 Further discussion of Partial Averaging 


The partial averaging property is 
[ ex\gar = [ xapwa Eg. (3.1) 
We can rewrite this as 
IE| Ia JE(X|G)] = EL D4.X)]. (3.2) 
Note that /4 is a G-measurable random variable. In fact the following holds: 


Lemma 3.10 /f V is any G-measurable random variable, then provided IF:|VIE(X|G)| < co, 


IE(VJE(X|G)] = IE[V.X]. 3.3) 


CHAPTER 2. Conditional Expectation 55 


Proof: To see this, first use (3.2) and linearity of expectations to prove (3.3) when V is a simple 
G-measurable random variable, i.e., V is of the form V = )°y_, cel 4,, where each A;, is in G and 
each c;, is constant. Next consider the case that V is a nonnegative G-measurable random variable, 
but is not necessarily simple. Such a V can be written as the limit of an increasing sequence 
of simple random variables V,,; we write (3.3) for each V,, and then pass to the limit, using the 
Monotone Convergence Theorem (See Williams), to obtain (3.3) for V. Finally, the general G- 
measurable random variable V can be written as the difference of two nonnegative random-variables 
V = Vt —V-, and since (3.3) holds for Vt and V~ it must hold for V as well. Williams calls 
this argument the “standard machine” (p. 56). | 


Based on this lemma, we can replace the second condition in the definition of a conditional expec- 
tation (Section 2.3.2) by: 


(b’) For every G-measurable random-variable V, we have 


IE[V.JE(X|G)] = IE[V.X]. (3.4) 


2.3.4 Properties of Conditional Expectation 


Please see Willams p. 88. Proof sketches of some of the properties are provided below. 
(a) IFUE(X|G)) = E(X). 

Proof: Just take A in the partial averaging property to be 22. 

The conditional expectation of X is thus an unbiased estimator of the random variable X. 
(b) If X is G-measurable, then 

E(X|G) =X. 

Proof: The partial averaging property holds trivially when Y is replaced by X. And since X 

is G-measurable, X satisfies the requirement (a) of a conditional expectation as well. 

If the information content of G is sufficient to determine _X , then the best estimate of X based 


on G is X itself. 


(c) (Linearity) 
IE (a, X1 + a2 X9|G) = ay JE (X4|G) + ag (X99). 


(d) (Positivity) If X > 0 almost surely, then 
IE(X|G) > 0. 
Proof: Take A = {w € 0; E(X|G)(w) < 0}. This set is in G since IH (X |G) is G-measurable. 
Partial averaging implies f, /(X|G)diP = f{, XdIP. The right-hand side is greater than 


or equal to zero, and the left-hand side is strictly negative, unless P(A) = 0. Therefore, 
IP(A) =0. 


56 


(h) (Jensen’s Inequality) If ¢ : R-R is convex and IE|é(X)| < oo, then 
E(O(X)|G) 2 6UE(X|9)). 
Recall the usual Jensen’s Inequality: }é(X) > @UE(X)). 
(i) (Tower Property) If # is a sub-o-algebra of G, then 
IE(IE(X|G)|H) = JE(X |). 


H is a sub-o-algebra of G means that G contains more information than 1. If we estimate X 
based on the information in G, and then estimate the estimator based on the smaller amount 
of information in H, then we get the same result as if we had estimated X directly based on 
the information in H. 


(j) (Taking out what is known) If 7 is G-measurable, then 
IE(ZX |G) = ZIE(X |g). 


When conditioning on G, the G-measurable random variable Z acts like a constant. 


Proof: Let 7 be a G-measurable random variable. A random variable Y is JE (7X |G) if and 
only if 


(a) Y is G-measurable; 
(b) {[,¥dP =f, ZXdP,VAEG. 


Take Y = Z.JE(X|G). Then Y satisfies (a) (a product of G-measurable random variables is 
G-measurable). Y also satisfies property (b), as we can check below: 


/ YdlP 
A 


E(14.Y) 


IE[L4ZIE(X|G)] 
E[I4Z.X] ((b’) with V = 147 


/ ZX dP. 
A 


(k) (Role of Independence) If H is independent of o(o(X), G), then 
IE(X|o(G, H)) = IE(X|G). 
In particular, if X is independent of H, then 
IE(X|H) = E(X). 


If H is independent of X and G, then nothing is gained by including the information content 
of # in the estimation of X. 


CHAPTER 2. Conditional Expectation 57 


2.3.5 Examples from the Binomial Model 


Recall that 7, = {¢, Ay, Ar, Q}. Notice that JE(.S2|7 1) must be constant on Ay and Ar. 
Now since IL (S2|¥ 1) must satisfy the partial averaging property, 


i: E(S:|F dP = | Sod P, 
Ag Au 


i) E(S:|F ia = | Sod P. 
Ar Ar 


We compute 


I, E(S|Fi)dP = IP(Aq).E(So|F1)(~) 


pIE(S2|F1)(w), Vw € Ap. 


On the other hand, 
| SodIP = p*u?So + pqudSo. 
Ay 


Therefore, 
IE(S2|Fi)(w) = pu? So + qudSo, Vw € Ap. 
We can also write 
IE(S9|Fi)(w) = pu? So + qudSo 
(pu + qd)uSo 
= (put qd)Si(w),Vw € Ay 


Similarly, 
IE(S2|F1)(w) = (put qd)Si(w), Vw € Ar. 


Thus in both cases we have 
IE(S2|F1)() = (pu + qd) S1(w), Ww € Q. 
A similar argument one time step later shows that 
IE(S3|F2)(w) = (pu + qd) S2(w). 


We leave the verification of this equality as an exercise. We can verify the Tower Property, for 
instance, from the previous equations we have 


IEVUE(S3|F2)|Fi] = JE[(pu + qd) S2|F 2] 


(pu + qd)IE(S2|F1) (linearity) 
= (put qd)*S\. 


This final expression is IF'(.S3|F1). 


58 
2.4 Martingales 


The ingredients are: 


e A probability space (Q, F, IP). 


e A sequence of o-algebras Fo, F1,...,7,, with the property that Fo C F, C...C Fy ¢ 
F. Such a sequence of o-algebras is called a filtration. 


e A sequence of random variables Mo, My,..., M,,. This is called a stochastic process. 
Conditions for a martingale: 


1. Each M;, is F;,-measurable. If you know the information in F;,, then you know the value of 
My. We say that the process {M;,} is adapted to the filtration {F;,}. 


2. For each k, JE (Mx41|F;.) = M;. Martingales tend to go neither up nor down. 


A supermartingale tends to go down, i.e. the second condition above is replaced by JIE(Mi41|F~%) < 
My; a submartingale tends to go up, i.e. EE(My4i|F x) > Mz. 


Example 2.3 (Example from the binomial model.) For k = 1, 2 we already showed that 
TE (Shi |Fr) = (pu + qd) Si. 


For k = 0, we set Fo = {¢, Q}, the “trivial c-algebra”. This c-algebra contains no information, and any 
F-measurable random variable must be constant (nonrandom). Therefore, by definition, (S|) is that 
constant which satisfies the averaging property 


[wlroar = f Syd. 
Q Q 


The right hand side is HS, = (pu + qd)So, and so we have 
JE(S|Fo) = (pu + qd)So. 
In conclusion, 


e If (pu + qd) = 1 then {S;, Fx; & = 0,1, 2,3} is a martingale. 
e If (pu + qd) > 1 then {S;, Fx; & = 0,1, 2,3} is a submartingale. 
e If (put qd) < 1 then {5;, Fx; & = 0,1, 2,3} is a supermartingale. 


Chapter 3 


Arbitrage Pricing 


3.1 Binomial Pricing 


Return to the binomial pricing model 


Please see: 


e Cox, Ross and Rubinstein, J. Financial Economics, 7(1979), 229-263, and 


e Cox and Rubinstein (1985), Options Markets, Prentice-Hall. 


Example 3.1 (Pricing a Call Option) Suppose u = 2,d = 0.5,r = 25%(interest rate), So = 50. (In this 
and all examples, the interest rate quoted is per unit time, and the stock prices So, .51,... are indexed by the 
same time periods). We know that 


_f 100 ifui=H 
Siw)={ os ifw, =T 


Find the value at time zero of a call option to buy one share of stock at time 1 for $50 (i.e. the strike price is 
$50). 


The value of the call at time 1 is 


50 ifw, = 
Vi (w) = (51 (w) — 50) = { 0: tenet 


Suppose the option sells for $20 at time 0. Let us construct a portfolio: 


1. Sell 3 options for $20 each. Cash outlay is —$60. 
2. Buy 2 shares of stock for $50 each. Cash outlay is $100. 
3. Borrow $40. Cash outlay is —$40. 


59 


60 


This portfolio thus requires no initial investment. For this portfolio, the cash outlay at time | is: 


w,-H w,=-T 


Pay off option $150 $0 
Sell stock —$200 —$50 
Pay off debt $50 $50 
$0 $0 
The arbitrage pricing theory (APT) value of the option at time 0 is Vo = 20. | 


Assumptions underlying APT: 


e Unlimited short selling of stock. 
e Unlimited borrowing. 
e No transaction costs. 


e Agent is a “small investor’, i.e., his/her trading does not move the market. 


Important Observation: The APT value of the option does not depend on the probabilities of H 
and 7’. 


3.2 General one-step APT 


Suppose a derivative security pays off the amount V; at time 1, where V; is an 7,-measurable 
random variable. (This measurability condition is important; this is why it does not make sense 
to use some stock unrelated to the derivative security in valuing it, at least in the straightforward 
method described below). 


e Sell the security for Vo at time 0. (Vo is to be determined later). 
e Buy Ao shares of stock at time 0. (Ap is also to be determined later) 


e Invest Vo — AoSo in the money market, at risk-free interest rate r. (Vo — ApSo might be 
negative). 


e Then wealth at time | is 


Xi AogSy + (1 + r) (Vo = AoSo) 


= (1+ r)Vo + Ao(S1 — (1+ 1r)So). 


e We want to choose Vo and Ag so that 
Xi, = Vy 


regardless of whether the stock goes up or down. 


CHAPTER 3. Arbitrage Pricing 61 


The last condition above can be expressed by two equations (which is fortunate since there are two 
unknowns): 


(1+ r)Vo + Ao(S1 (4) - (1 ols r) So) = Vi(H) (2.1) 


(1+ 1r)Vo + Ao(S1(T) = (ul Tr r) So) = Vi(T) (2.2) 


Note that this is where we use the fact that the derivative security value V;, is a function of S;, 
i.e., when 5S; is known for a given w, V; is known (and therefore non-random) at that w as well. 
Subtracting the second equation above from the first gives 
_ MG) - Vi(Z) 
0 = Se 
SiC) — Si(T) 


Plug the formula (2.3) for Ag into (2.1): 


(2.3) 


(1+ r)Vo = Vi (A) — Ao(Si() = (1+ 1r)So) 


= Vi( —V,(T) 

= Wo Gass. 

= - glu - Q(B) - (HE) — Vi) (w- 1-4) 
lt+r-d u—-l—r 

~ u—d Vi(H) u—d Vi(T) 


We have already assumed u > d > 0. We now also assume d < 1+ 7 < u (otherwise there would 
be an arbitrage opportunity). Define 


Then p > 0 andg > 0. Since p+ ¢ = 1, we have 0 < p < landg = 1-— jp. Thus, 9, ¢ are like 
probabilities. We will return to this later. Thus the price of the call at time 0 is given by 


1 


3.3. Risk-Neutral Probability Measure 


Let 2 be the set of possible outcomes from n coin tosses. Construct a probability measure P onQ 
by the formula 


ee ea a hobo eS 
P(w4,we, 5.078 Wn) = peliwj=H} gttiwj=T} 


IP is called the risk-neutral probability measure. We denote by IE the expectation under P. Equa- 


tion 2.4 says 
~ 1 
Vo = EE Vi}. 
° Ge ‘ 


62 
Theorem 3.11 Under IP, the discounted stock price process {(1+r)-*S., F_}e_9 is a martingale. 


Proof: 


E[(1+r)- FY S541 |Fa] 


= (1+ ry) (put Gd) Si 

= (en a e 
= try vO 

Ss (eyes, 


3.3.1 Portfolio Process 


The portfolio process is A = (Ag, Ai,... , An—1), where 


e A, is the number of shares of stock held between times & and & + 1. 


e Each A; is F;,-measurable. (No insider trading). 


3.3.2 Self-financing Value of a Portfolio Process A 


e Start with nonrandom initial wealth Xo, which need not be 0. 


e Define recursively 


Xko1 = AgSpgi + (1 +7r)(X_ — AgSs) (3.1) 
(L+r)X_+ Ag (Sk44 = (1+ r)S,). (3.2) 


e Then each X; is #;-measurable. 


Theorem 3.12 Under IP, the discounted self-financing portfolio process value {(1 + r)~* Xx, Fy beg 
is a martingale. 


Proof: We have 


(L+r)-FO Xp = (Ltr) 7X, + Ag (1 Pp OOP) Sih Cet roe) : 


CHAPTER 3. Arbitrage Pricing 63 


Therefore, 


(1 +r) Xi | Fa] 
= E[(.tr)-*Xi|Fa] 
4E[(1 + r)~ PVA: Seat |Fe] 
-IE[(1 + r)*ApSe| Fe] 
(1+ r)* Xj}, (requirement (b) of conditional exp.) 
+A, E[(1 + r)~"+0 5. 4|F;] (taking out what is known) 
—(1+r)-*A,S; (property (b)) 
(1+r)-*X;, (Theorem 3.11) 


3.4 Simple European Derivative Securities 


Definition 3.1 (Q) A simple European derivative security with expiration time m is an ¥,,, -measurable 
random variable V,,,. (Here, m is less than or equal to n, the number of periods/coin-tosses in the 
model). 


Definition 3.2 () A simple European derivative security V,,, is said to be hedgeable if there exists 
a constant Xo and a portfolio process A = (Ao,...,Am-1) such that the self-financing value 
process Xo, X1,...,Xm given by (3.2) satisfies 


Xm(w)=Vin(w), YwoeQ. 
In this case, for & = 0,1,...,m, we call X;, the APT value at time k of Vj. 


Theorem 4.13 (Corollary to Theorem 3.12) Jf a simple European security V,,, is hedgeable, then 
foreachk =0,1,...,m, the APT value at time k of Vin, is 


Ve (14 r)PBU(L + r)-" Vil Fa: (4.1) 


Proof: We first observe that if {M;,, 4 ,;k = 0,1,...,m} is a martingale, i.e., satisfies the 
martingale property 


IE|Myai|F |] = Mp 
for each k = 0,1,...,m— 1, then we also have 
IE[Mmn\|Fr] = Mek = 0,1,...,m—1. (4.2) 


When & = m — 1, the equation (4.2) follows directly from the martingale property. For k = m — 2, 
we use the tower property to write 
E[Min|F m2] — ELE (Mmn|Fm—1|F m2] 
FE[Mn-1 Fira] 
= My. 


64 


We can continue by induction to obtain (4.2). 


If the simple European security V,,, is hedgeable, then there is a portfolio process whose self- 
financing value process Xo, X1,...,X m satisfies X,,, = V,,. By definition, X;, is the APT value 
at time & of V,,,. Theorem 3.12 says that 


Kelley Nigel pe Os 
is a martingale, and so for each k, 
(L+r)7*X, = Ell +r)" Xm|Fe] = El + 1r)-"Vin|Fol- 


Therefore, = 
X,= +r)" E[0 4+ r)-? Vil Fel. 


3.5 The Binomial Model is Complete 


Can a simple European derivative security always be hedged? It depends on the model. If the answer 
is “yes”, the model is said to be complete. If the answer is “no”, the model is called incomplete. 


Theorem 5.14 The binomial model is complete. In particular, let V,,, be a simple European deriva- 
tive security, and set 


Vil... We) = (Ltr) FE[(L + r)~-"Vm|Fa](w1,.- eR); (5.1) 
Veti(@1,--. Wk, 1) — Vegi(wr,..., we, T) 

A; (wy, ... We) = ES ee 50 

RAP ET Genter Oe ASOD) ao 


Starting with initial wealth Vo = ET +r)~"V,,], the self-financing value of the portfolio process 
Ao, Ai,.-- ;Am-—1 is the process Vo, Vi,..., Vn: 


Proof: Let Vo,...,Vin—1 and Ao,...,Am-_—1 be defined by (5.1) and (5.2). Set Xo = Vo and 
define the self-financing value of the portfolio process Ap, ... , A,,—1 by the recursive formula 3.2: 


Xpgi = AgSega + (1+ 1) (X;, — Ap Se). 
We need to show that 
NPS Wigs VRE LON cs ths (5.3) 


We proceed by induction. For & = 0, (5.3) holds by definition of Xo. Assume that (5.3) holds for 
some value of k, i.e., for each fixed (w1,... ,w,), we have 


Xz (w1, prea , Wk) = Vi(wr, 208 Wh). 


CHAPTER 3. Arbitrage Pricing 65 


We need to show that 


Xp41(W1, eat ,Wk, 1) = Vesti (wi, eis pod, 


Xp41(Wi, eae Wk, 1’) = Vesti (wi, eae so); 
We prove the first equality; the second can be shown similarly. Note first that 
BU + 1) OV Fa] = IEUETL + 1)-" Vin Fea] Fr] 
Eil+r)-"Vin|Fa] 
= (1+ ry FV, 


In other words, {(1 + r)~*V,}%_, is a martingale under IP. In particular, 


Vi(wi,...,@e) = El 4+r)!Veqil|Fel(or,... , we) 


1 i 
a Gee (BVie+1(W1,.-- We, 1) + Vari (w1,..., We, T)). 


Since (w1,... ,w%) will be fixed for the rest of the proof, we simplify notation by suppressing these 
symbols. For example, we write the last equation as 


1 > ee 
Vi = ears (Visi) + Ve41(F)) - 


We compute 


Xpqi(H) 
= Ap Seoi(H) + (1+ 1r)(X_, — AgSe) 
= Ag (Srai (Ht) — (1 


r)s rk) I (14 T r)Ve 
Vert (1) — Viwi (TP) 
Sen) Sen(P) (Ski (H) — (1+1r)Sx) 
+PVin4o1 (A) + Vegi (T) 
7 Sree (uS, — 1 +r) Sx) 
+pVin41 (A) + Ver (T) 


= Wisi (H) = Vasu 2)) (=) 4+ Vig) + Wis (2) 
= Veri(H) — Viet (T)) 9+ BVe4 (A) + GV iri (LT) 
= Vi4i(H). 


66 


Chapter 4 


The Markov Property 


4.1 Binomial Model Pricing and Hedging 


Recall that V,,, is the given simple European derivative security, and the value and portfolio pro- 
cesses are given by: 


Ve = (14 r)*E[l4r)-"Vnl Fal) &=0,1,...,m—1. 


_ Vesi (1, eae Wk, 1) = Vesti (wi, eae Wk, L') 


A ae lo EOE ———— 
NONE 5 ens i I Si Leet 


Example 4.1 (Lookback Option) u = 2,d = 0.5,r = 0.25, 55 = 4, p= Lir—d 0.5,g=1—p=0.5. 
Consider a simple European derivative security with expiration 2, with payoff given by (See Fig. 4.1): 


Vo = max (5S, —5)T. 
0<k<2 
Notice that 
Vo(HH)=11, Vo(AT) =34 Vo(TH) =0, W(TT)=0. 
The payoff is thus “path dependent”. Working backward in time, we have: 
1 4 
Vi(H)= Te + qV¥a(HT)| = 5105 x 11+ 0.5 x 3] = 5.60, 
r 


_4 


Vi(T 510.5 x 0+0.5 x 0] = 0, 


— 


4 
Vo = 5 [0.5 x 5.60-+ 0.5 x 0] = 2.24. 


Using these values, we can now compute: 


2 Vit) SCE). 
ae Si(A) — S(T) es 
_ Vo(HH) — Vo(HT) _ 
mie) So(HH) — S.(HT) sels 


68 


a ore 
Ss; (H)=8 
Ske. S,(HT) =4 
S = 
wee So(TH) =4 
i - “ix. 
S5(TT) = 


Working forward in time, we can check that 
Xi (A) = AoSi (ff) + (1 + r)(Xo = Ao So) = 5.59; Vi(A) = 5.60, 
Xi(T) = AoSi(T) + (1 + r)(Xo = AoSo) = 0.01; Vi(T) = 0, 
X (HA) = Ai(A)S)\(7A) + (14+ 7)(X1 (A) — Ai(A)S1(7)) = 11.01; Wi (HA) = 11, 
etc. 


Example 4.2 (European Call) Let u = 2,d = $,r = 4,99 = 4,p = @ = 4, and consider a European call 
with expiration time 2 and payoff function 


Note that 
Vo(HH)=11, Vo(HT) = Vo(TH) =0, Va(TT) = 0, 


4 

Vi(H) = z[g-11 + 3.0] = 4.40 
4 

Vi(T) = =[4.0+ 4.0] =0 


5 
4 
Vo = slf x 440-4 3 x 0] = 1.76. 
Define vx (a) to be the value of the call at time & when S;, = x. Then 


vs(#) = S[bv0(2) + dvo(xe/2)], 


vo(a) = =[4v1(2x) + $01 (x/2)]. 


CHAPTER 4. The Markov Property 69 


In particular, 
vo(16) = 11, v2(4) = 0, ve(1) = 90, 


4 
v1(8) = gly + 5.0] = 4.40, 


A 


v4 (2) 5 


[$.0+ 4.0] =0, 


4 
Up = 53 x 4.40 + $ x 0] = 1.76. 
Let 6, () be the number of shares in the hedging portfolio at time & when S,; = x. Then 


by (2) = tie) : oe k=0,1. 


4.2 Computational Issues 


For a model with n periods (coin tosses), 2 has 2” elements. For period &, we must solve 2* 
equations of the form 


LS 7 
Ve(w1,.-.,&%) = Ty Viti (er, 2. Why 1) + GVegi(1,..., 04, T)]. 


For example, a three-month option has 66 trading days. If each day is taken to be one period, then 
n = 66 and 2° ~ 7 x 10". 


There are three possible ways to deal with this problem: 


1. Simulation. We have, for example, that 
Vo=(l+r)""EV,, 


and so we could compute Vo by simulation. More specifically, we could simulate n coin 
tosses w = (wy ,...,W,,) under the risk-neutral probability measure. We could store the 
value of V,,(w). We could repeat this several times and take the average value of V,, as an 
approximation to EV,. 


2. Approximate a many-period model by a continuous-time model. Then we can use calculus 
and partial differential equations. We’ll get to that. 


3. Look for Markov structure. Example 4.2 has this. In period 2, the option in Example 4.2 has 
three possible values v2(16), v2(4), v2(1), rather than four possible values V2(H H), V2( HT), V2(T'H), V2(P'T). 
If there were 66 periods, then in period 66 there would be 67 possible stock price values (since 
the final price depends only on the number of up-ticks of the stock price — i.e., heads — so far) 
and hence only 67 possible option values, rather than 2°° ~ 7 x 101°. 


70 
4.3. Markov Processes 


Technical condition always present: We consider only functions on R and subsets of R which are 
Borel-measurable, i.e., we only consider subsets A of R that are in 6 and functions g : RJR such 
that g~' is a function BB. 


Definition 4.1 () Let (Q, F,P) be a probability space. Let {F},}%_, be a filtration under F. Let 
{Xx }i-, be a stochastic process on (2, F, P). This process is said to be Markov if: 


e The stochastic process {.X;,} is adapted to the filtration {7;,}, and 


e (The Markov Property). For each k = 0,1,...,n — 1, the distribution of X;,41 conditioned 
on Fx is the same as the distribution of X41 conditioned on X;. 


4.3.1 Different ways to write the Markov property 


(a) (Agreement of distributions). For every A € B = BUR), we have 


IP(Xx41 € AlFe) = FUT A(Xp41)|Fe] 
E[La(Xi41)|Xe] 
= PX p41 E A|X;]. 


(b) (Agreement of expectations of all functions). For every (Borel-measurable) function h : IR JR 
for which IE |h(X;,41)| < 00, we have 


IE(h(Xx41)|Fx] = El A(Xx41)| Xe]. 
(c) (Agreement of Laplace transforms.) For every wu € JR for which BetXkt < oo, we have 


EE jeri 


Fi =E jee 


x, ; 


(If we fix u and define h(a) = e“”, then the equations in (b) and (c) are the same. However in 
(b) we have a condition which holds for every function /, and in (c) we assume this condition 
only for functions / of the form h(a) = e“”. A main result in the theory of Laplace transforms 
is that if the equation holds for every / of this special form, then it holds for every h, 1.e., (c) 
implies (b).) 


(d) (Agreement of characteristic functions) For every u € JR, we have 
E fetAiea \F | -KE Jeunes x; : 


where i = \/—I. (Since |e’“”| = | cos@+sin «| < 1 we don’t need to assume that F/|e""| < 
oo.) 


CHAPTER 4. The Markov Property 71 


Remark 4.1 In every case of the Markov properties where IE[...|X,] appears, we could just as 
well write g(X;,) for some function g. For example, form (a) of the Markov property can be restated 
as: 


For every A € B, we have 
P(Xx+1 € AlF x) = g(Xx), 
where g is a function that depends on the set A. 


Conditions (a)-(d) are equivalent. The Markov property as stated in (a)-(d) involves the process at 
a “current” time & and one future time & + 1. Conditions (a)-(d) are also equivalent to conditions 
involving the process at time & and multiple future times. We write these apparently stronger but 
actually equivalent conditions below. 


Consequences of the Markov property. Let j be a positive integer. 


(A) For every Azyi C IR,..., Any; CR, 


P[Xpg1 € Angiy--.Xegy € AnejlFa] = PlXeg1 © Angi,--- Xap € Ants |Xel- 


(A’) For every A € IR’, 


PU(Xe4q1, ae Xk+j) E A|F x] — Pl(Xesi, rae Xk+j) € A|X;]. 


(B) For every function h : IR! JR for which B|hA(Xp41,--- ,Xh4;)| < 00, we have 


IE(A(X gaa, ecte e p Mes Ng = IE{A(X gaa, eee »Xn4j)|Xz]- 


(C) For every u = (ty41,.--, Uk+j) € IR) for which Ee"#+1*4+1+--+u%+5X+5| < 00, we have 


TE[etetiXesite tung 5 Xpty |Fx] = IE[e%* piXepi te tupgy Xety |X; 


(D) For every u = (wh4i,--- 5 Uk+j) © IR! we have 


Bl etrtiXepi te tunt5 Xet5)) F = eters Xanga te tuts Met) Kp), 


Once again, every expression of the form //(...|X;,) can also be written as g(.X;), where the 
function g depends on the random variable represented by ... in this expression. 


Remark. All these Markov properties have analogues for vector-valued processes. 


72 


Proof that (b) => (A). (with 7 = 2 in (A)) Assume (b). Then (a) also holds (take h = J,). 
Consider 
IP[Xpy1 © Angi, Xe+2 © AppelF a] 
= FUT A,, | (Xeti)lAy  (Xet2) Fa] 

(Definition of conditional probability) 
= IEVETL A, (Xeti)F Ag yy (Xet2) Fazal Fe] 
(Tower property) 
= Elly, (Xe+1) ELA, (Xe+2)|Fega]| Fo] 
(Taking out what is known) 
= HElI 4, (Xe+1) ELTA, (Keto) | Kes ll Fe] 
(Markov property, form (a).) 
= ET, (X41). 9 Xess) | Fe] 
(Remark 4.1) 
= ET ,, (X41). g(Xeg1) Xe] 

(Markov property, form (b).) 


Now take conditional expectation on both sides of the above equation, conditioned on o(X;,), and 
use the tower property on the left, to obtain 


PX = Agy1, Xk+2 € Apso|XE] = FET 4, (Xeti)-g(Xegi)|Xe)- (3.1) 


Since both 
Pl Xep1 € Angi, Xep2 € Av+2|Fa] 


and 


P{Xe41 € Angi, Xet2 © Apto|Xe] 


are equal to the RHS of (3.1)), they are equal to each other, and this is property (A) with7 = 2. @ 


Example 4.3 It is intuitively clear that the stock price process in the binomial model is a Markov process. 
We will formally prove this later. If we want to estimate the distribution of 5,41 based on the information in 
F,, the only relevant piece of information is the value of S;. For example, 


E[Sr41|Fe] = (pu + Gd)Sy = (14+ 7) Sp (3.2) 


is a function of S;. Note however that form (b) of the Markov property is stronger then (3.2); the Markov 
property requires that for any function h, e 
EE[h(Sp41)|Fe] 


is a function of S;,. Equation (3.2) is the case of h(z) = z. 


Consider a model with 66 periods and a simple European derivative security whose payoff at time 66 is 


1 
Veg = 3 (Se + Ses + See). 


CHAPTER 4. The Markov Property 73 


The value of this security at time 50 is 


Veo = (Ltr) [(1 +r)~°Ve6|Fs0] 
= (1+9r)7'©E[Ve6|Sso], 


because the stock price process is Markov. (We are using form (B) of the Markov property here). In other 
words, the F's)-measurable random variable Vs, can be written as 


Vso(w1,... ,Ws50) = g(Ss50(w1,... ,w50)) 


for some function g, which we can determine with a bit of work. | 


4.4 Showing that a process is Markov 


Definition 4.2 (Independence) Let (2, 7, P) be a probability space, and let G and H. be sub-o- 
algebras of #. We say that G and are independent if for every A € G and B € H, we have 


P(AN B) = P(A) P(B). 


We say that a random variable X is independent of a a-algebra G if o(X), the o-algebra generated 
by X, is independent of G. 


Example 4.4 Consider the two-period binomial model. Recall that #1 is the c-algebra of sets determined 
by the first toss, i.e., #1, contains the four sets 


An S{HH,HT}, Ap S{TH,TT}, 6, Q. 
Let H be the c-algebra of sets determined by the second toss, i.e., #4 contains the four sets 
{HH, TH}, {HT, TT}, 6,2. 


Then F, and H are independent. For example, if we take A = {HH, HT} from F, and B = {HH,TH} 
from H, then IP(A MB) = IP(HH) = p? and 


P(A)IP(B) = (vp + pq)(p? + pg) = vp? (p +9)? =P. 
Note that F, and S» are not independent (unless p = 1 or p = 0). For example, one of the sets in a(S.) is 
{w; So(w) = u?So} = {HH}. If we take A = {HH, HT} from F, and B = {HH} from o(S2), then 
P(ANB)=P(HH)= p*, but 


P(A) IP(B) = (p’ + pap” = p?(p +9) =P*. 


The following lemma will be very useful in showing that a process is Markov: 


Lemma 4.15 (Independence Lemma) Let X and Y be random variables on a probability space 
(Q, F,P). Let G be a sub-o-algebra of F. Assume 


74 


e X is independent of G; 


e Y is G-measurable. 


Let f (%,y) be a function of two variables, and define 


ay) = Ef (X,y). 
Then 
ELF(X, YG] = 9). 


Remark. In this lemma and the following discussion, capital letters denote random variables and 
lower case letters denote nonrandom variables. 


Example 4.5 (Showing the stock price process is Markov) Consider an n-period binomial model. Fix a 
time k and define X & Set and G 2 F,. Then X = wifwe41 = H and X = dif we41 = T. Since X 


depends only on the (& + 1)st toss, X is independent of G. Define Y 2 S,, so that Y is G-measurable. Let h 
be any function and set f(«, y) 4 h(ay). Then 


gly) = Ef(X, y) = Eh(Xy) = ph(uy) + gh(dy). 


The Independence Lemma asserts that 


ETh(Sk41)|F] 


S 
E[h ( a Si) Fx] 
k 


= Elf(X,Y)I|g] 
g(¥) 
= ph(uS,) + gh(dS;). 


This shows the stock price is Markov. Indeed, if we condition both sides of the above equation on o(S;,) and 
use the tower property on the left and the fact that the right hand side is (5; )-measurable, we obtain 


Thus [h(S,41)|F x] and [h(S;,41)|X;] are equal and form (b) of the Markov property is proved. 


Not only have we shown that the stock price process is Markov, but we have also obtained a formula for 
JE[h(Sk41)|F x] as a function of S;,. This is a special case of Remark 4.1. 


4.5 Application to Exotic Options 
Consider an n-period binomial model. Define the running maximum of the stock price to be 
A 
M, = ae ore 


Consider a simple European derivative security with payoff at time n of v,(S;,, M,,). 


Examples: 


CHAPTER 4. The Markov Property 75 


© vn(Sn,Mn) = (M,, — K)* (Lookback option); 
© 0n(Sn,Mn) = Im,>B(Sn — K)t (Knock-in Barrier option). 


Lemma 5.16 The two-dimensional process {(5;,, My) }?_¢ is Markov. (Here we are working under 
the risk-neutral measure P, although that does not matter). 


Proof: Fix &. We have 
Mrsi = Mz V Skat, 


renee ; st A 
where V indicates the maximum of two quantities. Let 7 = Phan, so 


P(Z=u) =p, P(Z=d) =4, 
and Z is independent of ¥;,. Let h(z, y) be a function of two variables. We have 


ACSkti,Megi) = hCSk41, Me V Sey1) 
h(ZSz, MiV (ZSx)). 


Define 


I> 


IEh(Za,y V (Z2)) 
= ph(ur,yV (ux)) + gh(dz,yV (dz)). 


g(x,y) 


The Independence Lemma implies 
IE(h(Si41, Me4i)|Fu] = 9 (Sk, Mi) = ph(wSk, Me V (wSx)) + Gh(dSi, Mr), 


the second equality being a consequence of the fact that Md, A dS; = M),. Since the RHS is a 
function of (.S;,, !M;,), we have proved the Markov property (form (b)) for this two-dimensional 
process. | 


Continuing with the exotic option of the previous Lemma... Let V;, denote the value of the derivative 
security at time k. Since (1 + r)~*V; is a martingale under IP, we have 


—- —E k=0,1,... — 1. 
Vie l+r VexilF el, 0, ; 72 


At the final time, we have 
Va = Val Sr). 


Stepping back one step, we can compute 


Ls 
Vn-1 = Tap Elen (Sn, Mn) Fn-1] 


1 sag ~ 
= ar ree [POn(tSn—15 USin—1 V Mn-1) + GUn (GS pets M,-1)| : 


76 


This leads us to define 


Dns lhe By 2 
Un-1(@,y) = ee [Prn(ux, ux Vy) + Gvn(dz, y)] 


so that 


Va 1=Un 1( 3x 1M, Ne 
The general algorithm is 


Te fhe _ 
vp (2, Y) = — | Pups (ua, ua V y) + Guegi (da, y)|, 


1+r 
and the value of the option at time & is vg (Sz, Mz). Since this is a simple European option, the 
hedging portfolio is given by the usual formula, which in this case is 


UR+1 (US, (USK) VM) — veti (dS¢, Mx) 


A, = 
i (u— d)S; 


Chapter 5 


Stopping Times and American Options 


5.1 American Pricing 


Let us first review the European pricing formula in a Markov model. Consider the Binomial 
model with n periods. Let V,, = g(S;,,) be the payoff of a derivative security. Define by backward 
recursion: 


vale) = g(2) 
vee) = [Peng (uz) + dung (de)] 


Then v;,(.5;,) is the value of the option at time &, and the hedging portfolio is given by 


Ue¢1 (HSK) — VE+1 (ASK) 
Re > ee HS Lose. — 1. 
k (u— d) Si, ; 0, eek 72 


Now consider an American option. Again a function g is specified. In any period k, the holder 
of the derivative security can “exercise” and receive payment g(.S;,). Thus, the hedging portfolio 
should create a wealth process which satisfies 


X; > g(S;), Vk, almost surely. 


This is because the value of the derivative security at time k is at least g(.5),), and the wealth process 
value at that time must equal the value of the derivative security. 


American algorithm. 


m(t) = g(e) 
vx) = max { ——(unsa(ue) + does lde)), g(0)} 


Then v;(5;,) is the value of the option at time k. 


77 


78 


a S,(HH) = 16 v (16) = 0 
a si hs 
Sy(HT) =4 
v(4) =e]. 
ee Sy(TH) = 4 
“ i 
S,(TT) = (1) =4 


Example 5.1 See Fig. 5.1. Sp = 4,u 


Then 


U1 (8) = 


vi(2) = 


VO (4) = 


max | 5[}0-+ 4.1] (5-8)*} 
a 

ma {5 [bt+ 9.4] 6 -2)+} 
le 

max { $13.40-4) + 4.8.0], 6-4)*} 
max{1.36, 1} 

1.36 


Let us now construct the hedging portfolio for this option. Begin with initial wealth XY, = 1.36. Compute 


Ao as follows: 


0.40 


= 1(51(H)) 

= S\(H)Ao + (1+ 1r)(Xo — Ad So) 
= 8Aot+ (1.36 — 4Ao) 

= 8A,4+1.70 = Ap = —0.43 

= 1(51(T)) 


= Si(T)Ao + (1 + 17) (Xo — AoSo) 
= 2Ao+ (1.36 — 4Ao) 
= —3Ao+ 1.70 = Ap = —0.43 


CHAPTER 5. Stopping Times and American Options 719 


Using Ap = —0.48 results in 


X1(H) = v1 (Sy (T)) = 0.40, Xi(T) = v1 (S1(T)) = 3.00 


Now let us compute A, (Recall that S; (7) = 2): 
= v9 (4) 
= So(TH)A,(T) + (1+7r)(X1(L) — Ai(T)S1(T)) 
= 4A,(T)+ 7(3 — 2Ai(T)) 
= 1.5A\(T)+3.75 => Aj(T) = -1.83 
4 = va(1) 
= S(TT)AU(T) + +r)(X1 (7) — Ai(T)S1(T)) 


5 
= A(T) +2(3-2a.(0)) 
= —-1.5A1(T)+3.75 => A(T) = —0.16 
We get different answers for A, (7)! If we had X,(T') = 2, the value of the European put, we would have 
1=1.5A,(T) +2.5 => Ai(T) = -1, 


4=-1.5A,(T)+2.5 => Ai(T) =-1, 


5.2 Value of Portfolio Hedging an American Option 


Xk AgSegi + (1 +r) (X, — Cy — Ag Se) 


(L+r)X_+ Ag(Sk44 (1 } TSE) (1 { rc, 


Here, C%, is the amount “consumed” at time k. 


e The discounted value of the portfolio is a supermartingale. 
e The value satisfies X;, > g(S,),4 =0,1,...,n. 


e The value process is the smallest process with these properties. 


When do you consume? If 
BE (Ltr) vss (Sepa) Fe] < (1+ 7) "un (Se), 


or, equivalently, 


E( 


i Ub Sk41) Fe] < v4 (Sz) 


80 


and the holder of the American option does not exercise, then the seller of the option can consume 
to close the gap. By doing this, he can ensure that XY, = v,;(.S;,) for all &, where v; is the value 
defined by the American algorithm in Section 5.1. 


In the previous example, v1 (51(7)) = 3, ve(S2(LH)) = 1 and v2(S2(7T)) = 4. Therefore, 


El vlS)Ft) = = [a+ dl 
4 [5 
= ba 
- 2, 
wi{S(T)}) = 3; 


so there is a gap of size 1. If the owner of the option does not exercise it at time one in the state 
w, = T’, then the seller can consume | at time 1. Thereafter, he uses the usual hedging portfolio 


Ve41 (USK) — VEq1(dSz) 


Az = 
: (u— d)S;, 


In the example, we have v1 (5;(Z')) = g(.$1(7)). It is optimal for the owner of the American option 
to exercise whenever its value vz(5;,) agrees with its intrinsic value g(.5;,). 


Definition 5.1 (Stopping Time) Let (Q, 7, P) be a probability space and let {F;,};_, be a filtra- 
tion. A stopping time is arandom variable rT : Q->{0,1,2,...,}U {co} with the property that: 


fore OF elo) =] he Fay Vk = 05 1 yeas. 3 F600. 


Example 5.2 Consider the binomial model with n = 2, Sp = 4,u = 2,d 4, r +, sop= q= 4. Let 
Ug, U1, V2 be the value functions defined for the American put with strike price 5. Define 


r(w) = min{k; vg(Sk) = (5 — Sy) TH. 


The stopping time 7 corresponds to “stopping the first time the value of the option agrees with its intrinsic 
value’. It is an optimal exercise time. We note that 


1 ifweAr 
2 ifwe Ay 


{wi Tw) =0} = GEFo 
{wirw)=1} = Are Fi 
(wit) =2} = Ane Fe 


Example 5.3 (A random time which is not a stopping time) Inthe same binomial model as in the previous 
example, define 
p(w) = minth; Sew) = mo(w)}, 


CHAPTER 5. Stopping Times and American Options 81 


De tah : : fs : 
where mz = ming<j;<25;. In other words, p stops when the stock price reaches its minimum value. This 
random variable is given by 


1 ifw=TH, 


0 ifw € Ap, 
2 ifwaTT 


{w;p(w)=0} = An ¢ Fo 
{w;p(w) =1} = {TH} EF 
{w; p(w) =2} = {IT} EF: 


5.3 Information up to a Stopping Time 


Definition 5.2 Let 7 be a stopping time. We say that a set A C (2 is determined by time T provided 
that 
AN {w;7(w) =k} € Fx, Vk. 


The collection of sets determined by 7 is a c-algebra, which we denote by 7 ,. 
Example 5.4 In the binomial model considered earlier, let 
7 = min{k; vg(S,) = (5 — Se) TH, 


af A tee Ap 
TW) =) 9 ifelAy 


The set {H7'} is determined by time 7, but the set {TH } is not. Indeed, 


{HT} {w;r(w) =0} = $E€ Fo 
{HT} N{w;7rw)=1} = EF, 
{HT} {w;7rw)=2} = {HT} 6 Fo 


but 
{TH} {w;r(w) = 1} = {TH} ¢ Fi. 
The atoms of ¥, are 
{HT}, {HH}, Ar ={TH,TT}. 
| 


Notation 5.1 (Value of Stochastic Process at a Stopping Time) If (2, 7, P) is a probability space, 
{Fx} io is a filtration under F, {X;,}7_ 9 is a stochastic process adapted to this filtration, and 7 is 
a stopping time with respect to the same filtration, then X, is an #,--measurable random variable 
whose value at w is given by 


82 


Theorem 3.17 (Optional Sampling) Suppose that {Y;,, Fi}? (or {Y%, F¢} to) is a submartin- 
gale. Let tT and p be bounded stopping times, i.e., there is a nonrandom number n such that 


T<n, p <n, almost surely. 


Ift < p almost surely, then 
Y, SEY le). 


Taking expectations, we obtain IEY, < IEY,, and in particular, Yo = IFYo < EY. If{ Yn, Fi} foo 
is a supermartingale, thent < p implies Y, > IE(Y,|F;). 
If {Yn Fr} pp is a martingale, then t < p implies Y, = IE(Y,|F;). 


Example 5.5 In the example 5.4 considered earlier, we define p(w) = 2 for allw € Q. Under the risk-neutral 
probability measure, the discounted stock price process (3)-* S, is a martingale. We compute 


#|(3) sr] 


The atoms of F, are {HH}, {HT}, and Ar. Therefore, 
4 
r (HH) = (=) So(HH), 


[Qs 


IE (3) * Faery = ; So(HT), 
and forw € Ar, 
E (2) » r Ce (2) surm +4 (2) suer 


5X 2.564 5 x 0.64 


In every case we have gotten (see Fig. 5.2) 


r (w) = (2) secs 


CHAPTER 5. Stopping Times and American Options 


(4/5) ae = 6.40 


ae S>(HH) = 10.24 


16/25) Sy(HT) = 2.56 
S.= 
is S,(TH) = 2.56 
(4/5) 8, (T) = oe 
(16/25)S,(TT) = 0.64 


Figure 5.2: Illustrating the optional sampling theorem. 


83 


84 


Chapter 6 


Properties of American Derivative 
Securities 


6.1 The properties 


Definition 6.1. An American derivative security is a sequence of non-negative random variables 
{Gi} Ro such that each G';, is Fy-measurable. The owner of an American derivative security can 
exercise at any time k, and if he does, he receives the payment Gz. 


(a) The value V; of the security at time & is 
Vig = max (1+ r)PE[(L + r)-"Gy|F a], 
where the maximum is over all stopping times 7 satisfying 7 > & almost surely. 
(b) The discounted value process {(1 + r)~*V;,}%_, is the smallest supermartingale which satisfies 


Vi > Gy, Vk, almost surely. 


(c) Any stopping time 7 which satisfies 
Vo = E[( + r)-7G] 
is an optimal exercise time. In particular 
rf min{k; V; = Gy} 
is an optimal exercise time. 
(d) The hedging portfolio is given by 


_ Vega (Wry. Wy H) — Vega (Wi, -- ey T) 


A whe = 
(or, 1) SRW ioe ly, HL) — Sepa (@iy so Wel)’ 


85 


86 


(e) Suppose for some k& and w, we have V;(w) = G',(w). Then the owner of the derivative security 
should exercise it. If he does not, then the seller of the security can immediately consume 


Velwe) = FE WVivil File) 


and still maintain the hedge. 


6.2 Proofs of the Properties 


Let {Gi}, be a sequence of non-negative random variables such that each G;, is ¥),-measurable. 
Define 77, to be the set of all stopping times 7 satisfying k < + < n almost surely. Define also 


Ve 2 (1 +r)" max B [(1 + r)-7G|Fil 


TELE 


Lemma 2.18 V; > G; for every k. 


Proof: Take + € 7), to be the constant k. | 


Lemma 2.19 The process {(1 + 7)~*V,}%_, is a supermartingale. 
Proof: Let 7* attain the maximum in the definition of Vi.41, i-e., 

(Lt r)-OOVig = B [+r Gel Fea). 
Because 7” is also in T;,, we have 


E{+r) OV Fe] = ELE +r)-7Ge|Fepill Fe 


= El(it+r)-"Ge+|F5] 
max JE [(1 + r)-7G,|F x] 
TET, 


lA 


(1 + ry PVE: 


Lemma 2.20 /f {Y;,}{_ is another process satisfying 
¥, > Gy, k =0,1,...,", as., 
and {(1+r)~*¥;,}%_, is a supermartingale, then 


¥, > Ve, k =0,1,...,0, as. 


CHAPTER 6. Properties of American Derivative Securities 87 


Proof: The optional sampling theorem for the supermartingale {(1 + r)~"¥;,}?_, implies 


IE[(L+r)~7Y;|Fa] < 1+ r)7*¥4, Vr € Tr. 


Therefore, 
Ve = (1+r)* max E[(1+r)-7G,|F;4] 
TET, 
< (1+r)' max E[(1+r)~TY;|Fe] 
TET, 
< (L+r)-*§d+r)FYy, 
ae oe 
r 
Lemma 2.21 Define 
— 
= Vy,-——_E 
Ch Vi oe [VirilFe] 


= (+r) f(t r)-*Ve — EI +) Vig |Full} 


Since {(1 + r)-*Vi.}%_, is a supermartingale, C;, must be non-negative almost surely. Define 


= Vesti (1, wigs Wk, H) = Vesti (1, Rhee ees LS 
Se41(W1, sate Wk, H) =z Sr4i (1, ara Wh, T') 


Set Xo = Vo and define recursively 
Xk = Ag S41 + Ch + r)(X, oe Cy = A,.Sk). 


Az (wi,--- Wk) 


Then 
X;, = Vi Vk. 


Proof: We proceed by induction on k. The induction hypothesis is that X, = V; for some 
k € {0,1,...,n — 1}, ie., for each fixed (w1,... ,wx,) we have 

Xz (w1, satre , Wk) = Vie(w1, on Wh). 
We need to show that 


Xpqi(Wi, aan ,Wk, H) = Vesti (wi, See (edt), 


Xp41(W1, eae pwaf) = Vesti (wi, wae Ont). 
We prove the first equality; the proof of the second is similar. Note first that 
Ve (wr, aes , Wh) Ts Cr(wr, sate , Wk) 


1 ~ 
= Tae 7 EW itl F al (or, es coi) 


1 7 e 
= Tp (PVes 1+ ee, LD) + Vig 1, -+- 4k, T))- 


88 


Since (w1,...,w,) will be fixed for the rest of the proof, we will suppress these symbols. For 
example, the last equation can be written simply as 


1 7 & 
Vi SS Cy = ee (pVie41 (1) + WVr41(T)) : 


We compute 


Xp41( 1) _ Ap Spyi (1) + (1 +7) (X% -— Cy — AxSx) 
_ Vert) ~ Ver (P) 
= St) ~ Sea(D) (Su41(H) — 1 +1) Sx) 
+(1+7r)(Ve — Cy) 
= et ae (uS, — (1+ 17) Ss) 
+PVieg1 (A) + QVe41 (7) 
= (Ve4i(H) — Ver (P))G+ BVegi (A) + Visi (TL) 


= Veqi(H). 


6.3 Compound European Derivative Securities 


In order to derive the optimal stopping time for an American derivative security, it will be useful to 
study compound European derivative securities, which are also interesting in their own right. 


A compound European derivative security consists of n + 1 different simple European derivative 
securities (with the same underlying stock) expiring at times 0,1,...,7; the security that expires 
at time j has payoff C’;. Thus a compound European derivative security is specified by the process 
{Cj}i29, where each C; is F;-measurable, i.e., the process {Cj }'_9 is adapted to the filtration 
1F. k}h=0- 

Hedging a short position (one payment). Here is how we can hedge a short position in the 7’th 
European derivative security. The value of European derivative security 7 at time & is given by 


VP = (Lt r)FB[(L + r)-IC| Ful, B= 0, 53, 
and the hedging portfolio for that security is given by 
VO (wr, 12. Wk, 1) — VO (wr, sing dl) 
Si) (wr, +++ )Wk, H) am SY (wi, tee Wp, I’) 
Thus, starting with wealth vil ) and using the portfolio (av ) sets 5 rl 1 
time 7 we have wealth C’;. 


AY (wy, 22. ,we) = (k= Ovass 7H 


), we can ensure that at 


Hedging a short position (all payments). Superpose the hedges for the individual payments. In 
other words, start with wealth Vo = >7¥o ver), At each time & € {0,1,...,n — 1}, first make the 
payment C;, and then use the portfolio 


Ap,= A, (8t) + A, (*t?) Ste eset A,” 


CHAPTER 6. Properties of American Derivative Securities 89 


corresponding to all future payments. At the final time n, after making the final payment C’,,, we 
will have exactly zero wealth. 


Suppose you own a compound European derivative security{C i =0F Compute 


nr 


Vo = VO =F bs (l+r)C; 
g=0 


j=0 


and the hedging portfolio is {A jot ags You can borrow Vo and consume it immediately. This leaves 
you with wealth X9 = —Vo. In each period k, receive the payment C;, and then use the portfolio 
—Aj,. At the final time n, after receiving the last payment C’,,, your wealth will reach zero, i.e., you 
will no longer have a debt. 


6.4 Optimal Exercise of American Derivative Security 


In this section we derive the optimal exercise time for the owner of an American derivative security. 
Let {G;,}_) be an American derivative security. Let 7 be the stopping time the owner plans to 
use. (We assume that each G;, is non-negative, so we may assume without loss of generality that the 
owner stops at expiration — time n— if not before). Using the stopping time 7, in period 7 the owner 
will receive the payment 

Cp = Tay Gy 


In other words, once he chooses a stopping time, the owner has effectively converted the American 
derivative security into a compound European derivative security, whose value is 


ve) E b» (1+ r)-5C; 


br) ra 


I 
ssi 


= E[(itr)7G,]. 


The owner of the American derivative security can borrow this amount of money immediately, if 
he chooses, and invest in the market so as to exaclty pay off his debt as the payments {C; yo are 


received. Thus, his optimal behavior is to use a stopping time 7 which maximizes ve"), 
Lemma 4.22 vir) is maximized by the stopping time 
7 = mintks Vy = Gi} 


Proof: Recall the definition 


4 7 ae _ (7) 
ig ee em eae 


90 


Let r’ be a stopping time which maximizes ver), ie., Vo = E l(a + APUG . Because {(1 + r)-* Vi #29 
is a supermartingale, we have from the optional sampling theorem and the inequality V; > Gz, the 
following: 


Vo 


IV 
5 
= 
| 


Er)" Ve|Fo| 
= Elatn-"v,] 


Therefore, 


and 
Vy = Grr, as. 


We have just shown that if 7’ attains the maximum in the formula 


Vo = max E[(1+r)-7G;], (4.1) 
TET 
then 
Vi = Gr, a.s. 


But we have defined 
F Sam hs VG} s 


and so we must have r* < r’ < n almost surely. The optional sampling theorem implies 


(l+r)-7' Gy (l+r)-"V,« 


IB [1+ r)-7'V1| Fe] 


IV 


BE (1 +r)" Gy\Fe] . 


Taking expectations on both sides, we obtain 
I |(it+r)-"G] > E [1 +r)-"G] = Vo. 


It follows that 7* also attains the maximum in (4.1), and is therefore an optimal exercise time for 
the American derivative security. a 


Chapter 7 


Jensen’s Inequality 


7.1 Jensen’s Inequality for Conditional Expectations 


Lemma 1.23 [fp : IR—-JR is convex and IE'|~(X)| < 00, then 
EElp(X)|9] = eUE[X|9]). 
For instance, if G = {¢, Q}, p(2) = x?: 


IEX? > UEX)’. 


Proof: Since y is convex we can express it as follows (See Fig. 7.1): 


p(t) = max A(z). 


h<e 
his linear 
Now let h(a) = ax + 6 lie below y. Then, 


Elp(X)|9] EE[aX + |G] 
alE[X|G] +6 


ACIE[X |G) 


Il IV 


This implies 
HANG] > anges SAME SIG) 


his linear 


= pUE[X|9)). 


91 


92 


Figure 7.1: Expressing a convex function as a max over linear functions. 


Theorem 1.24 /f {Y;,}/_, is a martingale and ¢@ is convex then {p(Y;) }f— is a submartingale. 


Proof: 


Ele(Yrei)|\Fa] 2 eUETYe+1|Fe]) 
=: BLY). 


7.2 Optimal Exercise of an American Call 


This follows from Jensen’s inequality. 


Corollary 2.25 Given a convex function g : [0,00)—/R where g(0) = 0. For instance, g(a) = 
(x — K)* is the payoff function for an American call. Assume that r > 0. Consider the American 
derivative security with payoff g(S;,) in period k. The value of this security is the same as the value 
of the simple European derivative security with final payoff g(S'n), i.e., 


B[(+r)-"g(S,)] = max B[(1+r)-79(S,)), 


where the LHS is the European value and the RHS is the American value. In particulartT = n is an 
optimal exercise time. 


Proof: Because g is convex, for all A € [0, 1] we have (see Fig. 7.2): 

g(de) = g(de+ (L—d).0) 
Ag(e) +1 A).9(0) 
Ag(«). 


lA 


CHAPTER 7. Jensen’s Inequality 93 


(x,8(x)) 
(Ax, d g(x) 


\ x 
(Ax, g( Ax)) 


Figure 7.2: Proof of Cor. 2.25 


Therefore, 
and 
EB [a+ ry Og SiaylFe] = (+E [a(S y 
> (+B [9 (5 e41) 


IV 


Lp) 
(1+ r)-*g G F 
= er) Se); 


So {(1+r)~*g(S;) }¥_o is a submartingale. Let 7 be a stopping time satisfying 0 < rT < n. The 
optional sampling theorem implies 


(L+r)-79(Sr) < E[(.+r)-"9(Sn)|Fr]- 


a) 


+r 


Taking expectations, we obtain 


(1 +r)-79(S,)] 


lA 


I (E[(1+r)-"g(Sn)|Fol) 
= E[+r)-"g9(Sn)]. 
Therefore, the value of the American derivative security is 
max Hi [(1+r)-7g(S;)] < BA +r)-%9(S,)], 


and this last expression is the value of the European derivative security. Of course, the LHS cannot 
be strictly less than the RHS above, since stopping at time n is always allowed, and we conclude 
that on = 

max JE [(1+1)~79(S;)] = BE [1 +r)"g(Sn)]- 


94 


a ore 
Ss; (H) =8 
Ske. S,(HT) =4 
S = 
wee So(TH) =4 
i - “ix. 
STDS 


Figure 7.3: A three period binomial model. 


7.3 Stopped Martingales 


Let {Y;,}/~, be a stochastic process and let 7 be a stopping time. We denote by {¥;,-}?_, the 
stopped process 
Year) (w), &=0,1,..., 2. 
Example 7.1 (Stopped Process) Figure 7.3 shows our familiar 3-period binomial example. 
Define 


1 ifw,=T, 
rw)={ 2 ifwi=H 
Then 
S(HH)=16 if w=HH, 
a _ | S(AT)=4 ifw=AT, 
aarwy=4 g(r) =2 if w = TH, 
S(T) =2 if w=TT. 


Theorem 3.26 A stopped martingale (or submartingale, or supermartingale) is still a martingale 
(or submartingale, or supermartingale respectively). 


Proof: Let {Y;,}_, be a martingale, and 7 be a stopping time. Choose some k' € {0,1,..., 2}. 
The set {7 < &} is in Fy, so the set {7 > &k +1} = {7 < k}° is also in F;,. We compute 
HE [YnstyarlFe] = HE [Iprcay¥r + Uprzngty Vers Fr] 
= Tecny¥r + Lose Eri lFe] 


Live Yo + Lpesp iil k 
— Year: 


CHAPTER 7. Jensen’s Inequality 


95 


96 


Chapter 8 


Random Walks 


8.1 First Passage Time 


Toss a coin infinitely many times. Then the sample space (2 is the set of all infinite sequences 
w = (w1,W2,...) of H and 7. Assume the tosses are independent, and on each toss, the probability 
of HT is t, as is the probability of 7’. Define 


i if w; = H, 


me) -1 ifw,=T, 


Mo = 90, 


k 
Mie 2. NV ok i 
j=l 


The process {M;,}72.) is a symmetric random walk (see Fig. 8.1) Its analogue in continuous time is 
Brownian motion. 


Define 
7=min{k > 0;M, = 1}. 


If Mj, never gets to 1 (e.g.,w = (TTT ...)), then 7 = oo. The random variable 7 is called the 
first passage time to I. It is the first time the number of heads exceeds by one the number of tails. 


8.2 7 is almost surely finite 


It is shown in a Homework Problem that {/},}72., and {N;,}72, where 


0 -6 
Np = exp Oi — hl (==)! 


eo Mr ( 2 
ef + e-8 


97 


Figure 8.1: The random walk process M;, 


ey «8 2 


2 e+ & 


Figure 8.2: Illustrating two functions of @ 


are martingales. (Take A/;, = —.S;, in part (i) of the Homework Problem and take @ = —o in part 
(v).) Since No = 1 and a stopped martingale is a martingale, we have 


9 kAt 
1= IENga,- os ee Cered | (2.1) 


for every fixed 6 € JR (See Fig. 8.2 for an illustration of the various functions involved). We want 
to let k—+00 in (2.1), but we have to worry a bit that for some sequences w € 2, T(w) = oo. 


We consider fixed 6 > 0, so 


As k-0x, 
( 2 S (a2) if r<~, 
ef peo’ 0 if T=c 


Furthermore, M/j,,, < 1, because we stop this martingale when it reaches 1, so 


0 < cf Mkar < 


CHAPTER 8. Random Walks 99 


and 


In addition, 


Recall Equation (2.1): 


IE een ( 2 7] =1 
ef + e9 7 


Letting kK-00, and using the Bounded Convergence Theorem, we obtain 


¥) Te 
6 = 
For all 6 € (0, 1], we have 
2 sa 
6 
O<e (=) Lip ees Sk, 
so we can let 6,0 in (2.2), using the Bounded Convergence Theorem again, to conclude 
IE ie < = =1, 


1.5 
Pte coh ly 


We know there are paths of the symmetric random walk {M;,}7°., which never reach level 1. We 
have just shown that these paths collectively have no probability. (In our infinite sample space Q, 
each path individually has zero probability). We therefore do not need the indicator [ {rt < oo} in 


(2.2), and we rewrite that equation as 


IE (==) | =e, (2.3) 


8.3. The moment generating function for 7 


Let a € (0,1) be given. We want to find 6 > 0 so that 
2 
seal ares 


ae’ + ae’ -2=0 


Solution: 


a(e—*)? —~2e" +a =—0 


100 


4 l+vV1-—- a? 


(a4 


We want 6 > 0, so we must have e~? < 1. Now0 < a < 1,80 


0<(l—a)? <(l-a) <1-0’, 


l-a<vVl-a?’, 

1-Vl—-a? <a, 

1-vV1—- a? 

—— <l 

a 
We take the negative square root: 

a 1-vVJ1—- a? 

a 


Recall Equation (2.3): 


2 7 4 


With a € (0,1) and 6 > 0 related by 


et 1-vVJ1—- a? 
a 3 
2 
paar +e-9 }? 
this becomes 
han Paar 
igh Se eet; 3.1) 
fa 


We have computed the moment generating function for the first passage time to 1. 


8.4 Expectation of 7 


Recall that 
Py eee 
Ea = =. 02a<i, 
ray 
so 
d 
ew = E(ra™") 
ae ee ee a? 
da a 
1-vV1—-a? 


CHAPTER 8. Random Walks 101 


Using the Monotone Convergence Theorem, we can let af1 in the equation 


1-vV1—- a? 
a2/1— a?’ 


to obtain 


TET = &. 


Thus in summary: 


74 min{k; M;, = 1}, 
Pit ee) 1. 


TET = ow. 


8.5 The Strong Markov Property 


The random walk process {Mah 6 is a Markov process, i.e., 


JE[ random variable depending only on My41, Mg4o,..-| Fa] 


= JE | samerandom variable |Mz]. 


In discrete time, this Markov property implies the Strong Markov property: 


JE[ random variable depending only on M,41, Mr42,...| Fe] 


= JE | same random variable | M,]. 
for any almost surely finite stopping time rT. 


8.6 General First Passage Times 


Define 
Ty, 5 inin{k > OM = tn}, tg Does: 


Then 72 — 7, is the number of periods between the first arrival at level 1 and the first arrival at level 
2. The distribution of rz — 7, is the same as the distribution of 7, (see Fig. 8.3), ie., 


1-V1—@ 
Fa?77 = ———. ae (0,1). 


102 


i ne | 


Figure 8.3: General first passage times. 


For a € (0, 1), 

Elo®|F,] = Ela" |F,] 
= aoe 

(taking out what is known) 
= a Ela?-™|M,, | 

(strong Markov property) 
= a tEla?-™] 
(M,, =1, notrandom ) 


a oe eS 


(a4 


Take expectations of both sides to get 


Ea? = Ea”. ( 
In general, 


8.7. Example: Perpetual American Put 


Consider the binomial model, with u = 2,d = $, i i, and payoff function (5 — $;)*. The risk 
neutral probabilities are p = t, g= $, and thus 


Sh = Soule, 


CHAPTER 8. Random Walks 103 


where M;, is a symmetric random walk under the risk-neutral measure, denoted by P. Suppose 
So = 4. Here are some possible exercise rules: 


Rule 0: Stop immediately. r) = 0, V) = 1. 
Rule 1: Stop as soon as stock price falls to 2, i.e., at time 
T1 S min{k; M, = —1}. 
Rule 2: Stop as soon as stock price falls to 1, i.e., at time 
T_2 = min{k; M, = —2}. 
Because the random walk is symmetric under IP, Tm has the same distribution under IP as the 


stopping time 7,,, in the previous section. This observation leads to the following computations of 
value. Value of Rule 1: 


ve) = Elatn-(65-$,,)4] 

= 6-2)*E|(3)"| 

= Qian el 

3 
2 

Value of Rule 2: 
vies) = (6-1) #[H?| 

= 4.(4)? 


This suggests that the optimal rule is Rule 1, i.e., stop (exercise the put) as soon as the stock price 
falls to 2, and the value of the put is 3 if So = 4. 


Suppose instead we start with Sg = 8, and stop the first time the price falls to 2. This requires 2 
down steps, so the value of this rule with this initial stock price is 


(6-2)+B [47] =3.4% = 5 


In general, if Sg = 2/ for some j > 1, and we stop when the stock price falls to 2, then j — 1 down 
steps will be required and the value of the option is 


We define 


104 


If So = 2’ for some j <1, then the initial price is at or below 2. In this case, we exercise 
immediately, and the value of the put is 


v(24) 25-2, j= 1,0,-1,-2,... 


Proposed exercise rule: Exercise the put whenever the stock price is at or below 2. The value of 
this rule is given by v(2’) as we just defined it. Since the put is perpetual, the initial time is no 
different from any other time. This leads us to make the following: 


Conjecture 1 The value of the perpetual put at time k is v(Sx). 


How do we recognize the value of an American derivative security when we see it? 


There are three parts to the proof of the conjecture. We must show: 
(a) v(Sk) > (5 — SE)* Vk, 
4\k We : 
(b) {(4) v(Se) b is a supermartingale, 
(c) {v(S;,) }?2o is the smallest process with properties (a) and (b). 


Note: To simplify matters, we shall only consider initial stock prices of the form So = 2’, so Sz is 
always of the form 2/, with a possibly different 7. 


Proof: (a). Just check that 


This is straightforward. = 
Proof: (b). We must show that 


v(Sx) 


IV 


IE |40( Sei) Fa] 
4.1y(25,) + 4.40(45y). 
By assumption, S;, = 2’ for some j. We must show that 

v(2/) > 2v(27*") + 2v(2/-4). 


If j > 2, then v(2’) = 3.(4)/~! and 


CHAPTER 8. Random Walks 105 


If j = 1, then v(2) = v(2) = 3 and 


There is a gap of size =. 


If j < 0, then v(2’) = 5 — 2/ and 


2y(2/t1) + 2y(2I-1) 
pp at eee) 
= 4—2(441)27"! 
4 


There is a gap of size 1. This concludes the proof of (b). a 
Proof: (c). Suppose {¥; }7_, is some other process satisfying: 
(a’) Y, > (5—S;)* Vk, 
(b’) LSyEYa eos is a supermartingale. 
We must show that 

Y, > v(Sx) Vk. (7.1) 
Actually, since the put is perpetual, every time & is like every other time, so it will suffice to show 

Yo > v(Spo), (7.2) 


provided we let 59 in (7.2) be any number of the form 2’. With appropriate (but messy) conditioning 
on Fx, the proof we give of (7.2) can be modified to prove (7.1). 


For 7 < 1, 
pQ2)Sh =F = (b= 2/)*, 


so if So = 2/ for some j < 1, then (a’) implies 
Yo oe (5 = gst = v(So). 
Suppose now that So = 2) for some j > 2, i.e., So > 4. Let 


eS minke Sp = 2} 
= min{k;M,=j)- 1}. 


106 


Then 


is] 
er. 
WK 

= 
— 

lI 

Ss 
Fas, t 
ho 
he. 
—" 

lI 
area Ww 
a a 
bole 
— 
we 
ae 


Because {(2)"¥;,}?2o is a supermartingale 
Yo > E|(4)"¥,|] > B [(4)" (5 - S,)*] = v(50). 
a 


Comment on the proof of (c): If the candidate value process is the actual value of a particular 
exercise rule, then (c) will be automatically satisfied. In this case, we constructed v so that v(.5;,) is 
the value of the put at time & if the stock price at time k is S; and if we exercise the put the first time 


(k, or later) that the stock price is 2 or less. In such a situation, we need only verify properties (a) 
and (b). 


8.8 Difference Equation 


If we imagine stock prices which can fall at any point in (0, 00), not just at points of the form 2/ for 
integers 7, then we can imagine the function v(x), defined for all 2 > 0, which gives the value of 
the perpetual American put when the stock price is x. This function should satisfy the conditions: 


(a) v(x) > (K —2)T, Ve, 
(b) v(x) > cL [poluz) + Go(de)], Ve, 
(c) At each 2, either (a) or (b) holds with equality. 


In the example we worked out, we have 


For j > 1: v(27) =3.(4)?"' = mE 


For j <1: (2?) =5- 29, 


This suggests the formula 


We then have (see Fig. 8.4): 
(a) v(x) > (5-2); Va, 


(b) v(x) >? [4v(2e) + 40(3)| for every z except for 2 < # < 4. 


CHAPTER 8. Random Walks 107 


v(x) 


Figure 8.4: Graph of v(). 


Check of condition (c): 


e If0 < x < 3, then (a) holds with equality. 


e If x > 6, then (b) holds with equality: 


x 6 12 6 
4 [jen + wQ] =a [Eo +4e]=2 


elf3 <« < 4or4 < x < 6, then both (a) and (b) are strict. This is an artifact of the 
discreteness of the binomial model. This artifact will disappear in the continuous model, in 
which an analogue of (a) or (b) holds with equality at every point. 
8.9 Distribution of First Passage Times 
Let {Mi}? be a symetric random walk under a probability measure /P, with Mp = 0. Defining 


7=min{k > 0;M, = 1}, 


we recall that 


We will use this moment generating function to obtain the distribution of 7. We first obtain the 
Taylor series expasion of IF'a’ as follows: 


108 


fe) = 1-vi=s, f()=0 

fie) = $23, f=} 

Me) = 70-2), f= s 

me) = S-ay 3, fr O=s 

Oe i aie ae (29-3) (4 a 25-1 3 

pa) = ie ere. * (27 — 3) 
2. VRB X eX CF = 3) 2 RAK eee R27 = 2) 
_ (uy 25-2)! 
=i) (7-1)! 


{(@) = leav/l=—@ 


ea eee : 
= S- Gf 2! 
j=0"" 


1 \ 25-1 (27 - 2)! ; 
: X(:) ose 


x 1\23-1 27-2 E 
- Ly ay). 


j=2 


So we have 


Eat = 


But also, 


Ea’ = > oI"! Pir = 27 =i}: 


j=l 


CHAPTER 8. Random Walks 109 


Figure 8.5: Reflection principle. 


Figure 8.6: Example with j = 2. 


Therefore, 


Pica = 


IP{r = 27-1} 


lI 
To NIE 
No} be 
Se 
bo 
Se 
ne 

Peas 

Qo, 

| JR 

—_ 

—" 

fo 
i] 
Qo, 
i] 

Se 

lI 

N 

od 


8.10 The Reflection Principle 


To count how many paths reach level | by time 27 — 1, count all those for which My;_; = 1 and 
double count all those for which M;_; > 3. (See Figures 8.5, 8.6.) 


110 


In other words, 
IP{r < 23 = 1} = IP{M;-1 = 1} Sie 2IP{Mo;-1 Pe 3} 
= P{Moj-1 =1}+ P{Ma;-1 > 3} + P{Moj-1 < -3} 
1 — P{Mo;-1 = -1}. 


For 7 > 2, 
Pir =2j7-1} = P{r<2j-1}- P{r < 27-3} 
=. [bad Me) = be ee = 1)] 
= IP{Mp2;-3 — —1} = IP{Mo;-1 = —1} 


= G “NG a) 

= (°° SES wii -)- 24-1 i-2) 
= (4) FESS pies - 2) - @i- ei -2) 
SOMME ce 

- AC) 


Chapter 9 


Pricing in terms of Market Probabilities: 
The Radon-Nikodym Theorem. 


9.1 Radon-Nikodym Theorem 


Theorem 1.27 (Radon-Nikodym) Let P and IP be two probability measures on a space (Q, F). 
Assume that for every A € F satisfying IP(A) = 0, we also have IP(A) = 0. Then we say that 
P is absolutely continuous with respect to P. Under this assumption, there is a nonegative random 
variable Z such that 


P(A) = | zap, WAEF, (1.1) 


and Z is called the Radon-Nikodym derivative of P with respect to P. 


Remark 9.1 Equation (1.1) implies the apparently stronger condition 
IEX = E[XZ] 
for every random variable X for which IE|X Z| < oo. 


Remark 9.2 If P is absolutely continuous with respect to P, and P is absolutely continuous with 
respect to JP, we say that P and JP are equivalent. P and JP are equivalent if and only if 


IP(A) =0 exactly when P(A) = 0, VA € F. 


If P and PP are equivalent and 7 is the Radon-Nikodym derivative of IP wet. P, then z is the 
Radon-Nikodym derivative of P w.r.t. P, 1.€., 


IEX = E[XZ] VX, (1.2) 
EY = HY. s] VY. (1.3) 


(Let X and Y be related by the equation Y = X Z to see that (1.2) and (1.3) are the same.) 


111 


112 


Example 9.1 (Radon-Nikodym Theorem) Let 2 = {HH,HT,TH,TT}, the set of coin toss sequences 
of length 2. Let P correspond to probability 4 for H and 2 for 7’, and let IP correspond to probability $ for 


H and $ for T. Then Z(w) = we. so 


Z(HH) = . Z(HT) ==, Z(TH) 


9.2. Radon-Nikodym Martingales 


Let 22 be the set of all sequences of n coin tosses. Let P be the market probability measure and let 
JP be the risk-neutral probability measure. Assume 


P(w) > 0, P(w) > 0, Ww €Q, 


so that P and IP are equivalent. The Radon-Nikodym derivative of P with respect to P is 


Define the P-martingale 
Ge TEA lc SO Aca gt 
We can check that 7; is indeed a martingale: 
E\Zp4i|Fa] = EVE [Z| Feoi]| Fu] 
E(Z\F x] 
= “Lh. 


Lemma 2.28 If X is Fy-measurable, then IEX = IE[X Z;]. 


Proof: 

IEX = E[XZ| 
ELE|X 2|F 5] 
IE [X.E[Z|F yl] 
= E[XZ. 


Note that Lemma 2.28 implies that if X is 7),-measurable, then for any A € Fz, 


E(IAX] = E[Z,14X], 


or equivalently, 
[x= | XZ, P. 
A A 


CHAPTER 9. Pricing in terms of Market Probabilities 113 


Z(HH) = 9/4 
13 
Z,(H) = 3/2 
1/3 2/3 
~ Z,(HT) = 9/8 
on Ps Ms ” Z(TH) = 98 
Z (1) = 3/ 
213 
Z(TT) = 9/16 


Figure 9.1: Showing the Z;, values in the 2-period binomial model example. The probabilities shown 
are for P, not IP. 


Lemma 2.29 If X is ¥;,-measurable and0 <j < k, then 


— 1 
E(X|F)] = SEX AIF. 
J 


Proof: Note first that ze [X Z),|F;] is F ;-measurable. So for any A € F;, we have 
J 


1 sak 
i PEIXA|F MP = / E|X2i\F;]dIP (Lemma 2.28) 
AZ; A 


i XZ,dIP (Partial averaging) 
A 


i XdIP (Lemma 2.28) 
A 


Example 9.2 (Radon-Nikodym Theorem, continued) We show in Fig. 9.1 the values of the martingale 7;,. 
We always have %, = 1, since 


9.3. The State Price Density Process 


In order to express the value of a derivative security in terms of the market probabilities, it will be 
useful to introduce the following state price density process: 


Ce A+ Fy) FZ, HO}. 5m. 


114 


We then have the following pricing formulas: For a Simple European derivative security with 
payoff C7, at time k, 


Vo 


IE |(1+r)-*Cy] 
IE |(1+r)-*Z,Cy] (Lemma 2.28) 
= ElceCe]. 


More generally for 0 < 7 < k, 


V; = (+r [+ ry CK Fi] 
_ Ct eatin] (Lemma 2.29) 
43 
= > E[GCoIF | 
Gi 


Remark 9.3 {¢ iVit*o is a martingale under P, as we can check below: 


FElCj4.VisilF 3] = UE [CeCelF 54] F,] 
ECC | F,] 
= ¢jVj. 


Now for an American derivative security {G';,}7_): 


Vo = sup E[(l+r)7G,] 
TETo 
= sup E[(1+7r)-7Z,G,] 
TETO 
= sup ElG-Grl. 
TETO 


More generally for0 < 7 <n, 


V; = (+r) sup E [(1+r)-7G,|F,] 
TET; 
. 1 
= (14+r)/ sup —F[(1+1r)7Z,G,|F,] 
rel; 4j 


1 
= = sup IE|C,-G-|F |. 
Gi TET; 


Remark 9.4 Note that 


(a) {¢;Vj}7 <0 is a supermartingale under P, 


(b) ¢;V; > GG; V7, 


CHAPTER 9. Pricing in terms of Market Probabilities 115 


C,(HIM) = 1.44 


v S5(HH) = 16 


6 )(H) = 1.20 7 


We 
S(H)=8 
C(HT) = 0.72 
13 OB, S5(HT) =4 
pan 2/3 1B S,(TH) =4 
raat 


ages (TH) = 0.72 
S)(T) =2 5 
4 (D=0.6 
2/3 
 $(TT) = 0.36 
S,(IT) =1 


Figure 9.2: Showing the state price values ¢;,. The probabilities shown are for P, not P. 


(c) {¢ iVi to is the smallest process having properties (a) and (b). 


We interpret ¢, by observing that ¢;,(w)JP(w) is the value at time zero of a contract which pays $1 
at time & if w occurs. 


Example 9.3 (Radon-NikodymTheorem, continued) We illustrate the use of the valuation formulas for 
European and American derivative securities in terms of market probabilities. Recall that p = z, q= 2. The 
state price values ¢;, are shown in Fig. 9.2. 


For a European Call with strike price 5, expiration time 2, we have 


V2(HH) = 11, Co(HH)Vo(HH) = 1.44 x 11 = 15.84. 
( 


V2(HT) = Vo(TH) = V2(TT) = 0. 
Did 
Va = 5X 5x 15.845 1.76. 
(o(HH) 1A4 
Ba ee Vy(HH) = <= x 11 = 1.20 x 11 = 13.20 
Gnay 0) = og * : 
1 
Vi(H) = = x 13.20 = 4.40 


Compare with the risk-neutral pricing formulas: 
Vi(H) = 2V\ (HH) + 2Vi (AT) = 2 x 11 = 4.40, 
Vi(T) = 2Vi (TH) + 2Vi (TT) = 0, 
Vo = 2Vi(H) + 2Vi(T) = 2 x 4.40 = 1.76. 
Now consider an American put with strike price 5 and expiration time 2. Fig. 9.3 shows the values of 
¢e(5 — S,)*. We compute the value of the put under various stopping times Tr: 
(0) Stop immediately: value is 1. 
() Ifr(HH)=r7(AT) = 2, r(TH) = r(TT) = 1, the value is 


1 
5X $x 0.724 4 x 1.80 = 1.36. 


116 


(5 - S(HH))"= 0 
C4HH)(5 - S,(HH))"= 0 


(5- S(H)J = 0 
C(H)(5 - S(H)J = 0 
(5- S(HT))"= 1 
CfHT) (5 - S(HT) t= 0.72 


% 


1/3 23 
(So) (5 - S,(TH))*= 1 
G0 (5-So) =1_ \2/3 1/3 C{TH) (5 - S,(TH))*= 0.72 


(5 - S(T)" = 3 
C{T) (5- S(T) = 1.80 
2/3 (5-S(TT) "= 4 
C{TT) (5 - S,(TT)) +. 1.44 


Figure 9.3: Showing the values ¢ 4(5 — S,)* for an American put. The probabilities shown are for 
P. not IP. 


(2) If we stop at time 2, the value is 


1 
x BX OT2+ 3x 3x 0.7245 x 3x 144 = 0.96 


wlrRe 


We see that (1) is optimal stopping rule. | 


9.4 Stochastic Volatility Binomial Model 


Let 2 be the set of sequences of n tosses, and let0 < dy < 1+rj;, < uz, where for each k, dy, uy, rp 
are F;,-measurable. Also let 


~  Ll+tre-de  .  u—(L+rx) 
Pk = — dk = _: 
wp, — dp Up — dy 


Let IP be the risk-neutral probability measure: 
Ploy = HT} = Po: 
P{wy = T} =? qo: 
and for2 <k <n, 


Plorgi = H|F i] = Be, 


Paea=Tie j=. 


Let P be the market probability measure, and assume P{w} > 0 Vw € Q. Then P and IP are 
equivalent. Define 


Gee) Wo ea: 
Ww 


CHAPTER 9. Pricing in terms of Market Probabilities 117 


M,= (1+ rg—1)Mi-1, | ae eee 


Note that A4;, is F;,_1-measurable. 


We then define the state price process to be 
k ] ly, ky ’ ; 


As before the portfolio process is {Rak The self-financing value process (wealth process) 
consists of Xo, the non-random initial wealth, and 


Xp = AgSega + 1+ re)(X_ — ApSe), K=0,...,n-1. 
Then the following processes are martingales under P: 
1 8 1 he 
Gag ae 
and the following processes are martingales under P: 
{CeSkte=o and {CpXk}R=o- 
We thus have the following pricing formulas: 


Simple European derivative security with payoff C;, at time k: 


Peat a7 
y= me 
1 
= SE [Cr F 3] 
Gi 
American derivative security {G;,}/_p: 
Ve M; sup E alr] 
TET; ve 


1 
= = sup IE (¢-G-|F 5] : 
G TET; 


The usual hedging portfolio formulas still work. 


118 
9.5 Another Applicaton of the Radon-Nikodym Theorem 


Let (Q, F, Q) be a probability space. Let G be a sub-o-algebra of ¥, and let X be a non-negative 
random variable with {og X dQ = 1. We construct the conditional expectation (under Q) of X 
given G. On G, define two probability measures 


IP(A) =Q(A) VAEG; 


P(A) = f xaQ WA EG. 


Whenever Y is a G-measurable random variable, we have 


py aps Ly aa; 


if Y = 1, for some A € G, this is just the definition of JP, and the rest follows from the “standard 
machine”. If A € G and IP(A) = 0, then Q(A) = 0, so IP(A) = 0. In other words, the measure IP 
is absolutely continuous with respect to the measure IP. The Radon-Nikodym theorem implies that 
there exists a G-measurable random variable 7 such that 


Pay? | zap VAEG, 


[ xQ= | zap WAEG. 


This shows that 7 has the “partial averaging” property, and since 7 is G-measurable, it is the con- 
ditional expectation (under the probability measure ()) of X given G. The existence of conditional 
expectations is a consequence of the Radon-Nikodym theorem. 


Chapter 10 


Capital Asset Pricing 


10.1 An Optimization Problem 


Consider an agent who has initial wealth Xo and wants to invest in the stock and money markets so 


as to maximize 
FE log Xy. 


Remark 10.1 Regardless of the portfolio used by the agent, {¢;,X;,}72o is a martingale under P, so 
IEG, Xn = Xo (BC) 
Here, (BC) stands for “Budget Constraint”. 


Remark 10.2 If € is any random variable satisfying (BC), i.e., 
ECE = Xo, 


then there is a portfolio which starts with initial wealth Xo and produces X,, = € at time n. To see 
this, just regard € as a simple European derivative security paying off at time n. Then Xo is its value 
at time 0, and starting from this value, there is a hedging portfolio which produces X,, = €. 


Remarks 10.1 and 10.2 show that the optimal X,, for the capital asset pricing problem can be 
obtained by solving the following 

Constrained Optimization Problem: 

Find a random variable € which solves: 


Maximize JE log € 


Subject to ¢,€ = Xo. 


Equivalently, we wish to 
Maximize SS (log €(w)) IP(w) 
wef) 


119 


120 


Subjectto S~ ¢,(w)€(w)P(w) — Xo =0. 
we 


There are 2” sequences w in 2. Call them w1,w2,...,W2n. Adopt the notation 
Ty = (01); vc. = €(we), see gy TOMS E(won). 


We can thus restate the problem as: 


Die 
Maximize S- (log Lp) IP(w x) 
k=1 
Die 
Subjectto S°¢,(wr)eeP(wr) — X. = 0. 
k=1 


In order to solve this problem we use: 


Er 


Theorem 1.30 (Lagrange Multiplier) /f (zj,... ,x*,) solve the problem 


Maxmize f(%1,...,%m) 
Subject to g(1,.-.,%m) = 9, 
then there is a number 4 such that 
0 > > 0 * * 
pee es en Mp OP its ae Nis ete dled as (1.1) 
and 
Giese ee oS 0. (1.2) 


For our problem, (1.1) and (1.2) become 


1 
ae Ps) SN (OL (Opler (la) 
k 
Die 
Cn (wat IP (@R) = Xo: (1.2’) 
k=1 
Equation (1.1’) implies 
f 1 
“FS Nal) 
Plugging this into (1.2’) we get 
ie 1 
S$" P(wx) = Xo 5 = Xo 


CHAPTER 10. Capital Asset Pricing 121 


Therefore, 


Thus we have shown that if €* solves the problem 


Maximize I logé 


Subjectto  E(¢,,f) = Xo, re) 
then 
Xo 
es (1.4) 
Gn 
Theorem 1.31 [f &* is given by (1.4), then &* solves the problem (1.3). 
Proof: Fix 7 > 0 and define 
f(z) = loge -— ¢Z. 
We maximize f over x > 0: 
; 1 
fi(e)=--Z=0 — t=>5, 
x 
UGhes ce 0, Vv IR 
The function f is maximized at «* = s: 1.€., 
1 
logz — #Z < f(a*) = log ZT 1, Va >0,VZ> 0. (1.5) 


Let € be any random variable satisfying 


and let 


From (1.5) we have 


Taking expectations, we have 


Blogg ~ (G6) < Blog = 1 


and so 
Flog é < Flog &. 


122 


In summary, capital asset pricing works as follows: Consider an agent who has initial wealth Xo 
and wants to invest in the stock and money market so as to maximize 


IE log X,,. 


The optimal X,, is X,, = a, Le., 
CN = Xo. 


Since {¢,X;}/_, is a martingale under P, we have 
CeXe = EC XalFel =X, F HU 5H, 


SO 


and the optimal portfolio is given by 


Xo - Xo 
Aj (w1 wr) o Ce (W1, .». Wk) h+1(1,---,Wk,T) 
gues SS = — Eo 
Sroi (wr, <a} Wk, H) ~ Sroi (wr, ahaa Wk, D’) 


Chapter 11 


General Random Variables 


11.1 Law of a Random Variable 


Thus far we have considered only random variables whose domain and range are discrete. We now 
consider a general random variable X : Q-4JR defined on the probability space (Q,F,P). Recall 
that: 


e ¥ isao-algebra of subsets of 22. 
e Pisa probability measure on F, i.e., P(A) is defined for every A € F. 


A function X : Q-4JR is a random variable if and only if for every B € BUR) (the o-algebra of 
Borel subsets of R), the set 


{X ¢ B}2.X71(B) 2 fu: X(w) € BY EF, 


ie., X : Q-IJR is a random variable if and only if X~! is a function from B(JR) to F(See Fig. 
11.1) 


Thus any random variable X induces a measure j:x on the measurable space (JR, BUIR)) defined 
by 
px(B) = IP (X-1(B)) WB € BUR), 


where the probabiliy on the right is defined since X~!(B) € F. rx is often called the Law of X — 
in Williams’ book this is denoted by Ly. 


11.2 Density of a Random Variable 
The density of X (if it exists) is a function fy : IR-[0, oo) such that 
renee a fx(e) de VB BUR). 
B 


123 


124 


{X € B} Q 


Figure 11.1: Mdlustrating a real-valued random variable X. 


We then write 

dix (a) = fx(x)dz, 
where the integral is with respect to the Lebesgue measure on R. fy is the Radon-Nikodym deriva- 
tive of 44x with respect to the Lebesgue measure. Thus X has a density if and only if wx is 
absolutely continuous with respect to Lebesgue measure, which means that whenever B € BUR) 


has Lebesgue measure zero, then 
IP{X € B} =0. 


11.3. Expectation 


Theorem 3.32 (Expectation of a function of X) Let h : IR-JR be given. Then 


En(x) & de h(X(w)) dIP(w) 

= ff hl) dux(e) 
R 

= | erie. 
R 

Proof: (Sketch). If h(a) = 18(x) for some B C WR, then these equations are 
Hip) S Pix es) 
= jx(B) 
= : fx (a) dz, 
B 


which are true by definition. Now use the “standard machine” to get the equations for general h. 
a 


CHAPTER 11. General Random Variables 125 


(X,Y) 


{ (XY)e C} Q 


Figure 11.2: Two real-valued random variables X,Y. 


11.4 Two random variables 


Let X,Y be two random variables Q-JR defined on the space (Q,7,P). Then X,Y induce a 
measure on B(JR*) (see Fig. 11.2) called the joint law of (X,Y), defined by 


pxy(C) 2 P{(X,Y) €C} VC € BUR). 
The joint density of (X,Y) is a function 


fy : IR?=(0, oo) 


that satisfies 


pxv(C) = ff fxav(ey) dedy YC © BUR?). 
C 


fx.y is the Radon-Nikodym derivative of jx with respect to the Lebesgue measure (area) on JR?. 


We compute the expectation of a function of X, Y in a manner analogous to the univariate case: 


EMX,Y) 2 f i XW),¥)) Pw) 


I k(a,y) duxy(z,y) 


| k(x, y)fxy(x,y) dudy 
R 


126 
11.5 Marginal Density 


Suppose (X, Y) has joint density fy y. Let B C JR be given. Then 


py(B) = WP{Y € B} 
= IP{(X,Y) € Rx B} 
= pxyUR~x B) 


[ a fxy (a, y) dedy 
iB fv (y) dy, 


& if Ixy (*,y) dx 


Therefore, fy (y) is the (marginal) density for Y. 


where 


11.6 Conditional Expectation 


Suppose (X,Y) has joint density fy yy. Let h : IR-JR be given. Recall that E’[h(X)|Y] S 
IE|h(X)|o(¥)] depends on w through Y, i.e., there is a function g(y) (g depending on /) such that 


IE[R(X)|Y](@) = g(¥ @))- 


How do we determine g? 


We can characterize g using partial averaging: Recall that A € o(Y)<>A = {Y € B} for some 
B © BUR). Then the following are equivalent characterizations of g: 


[9m dP = es dIP YA€o(Y), (6.1) 
A A 

ih 1n(Y)g(Y) dP = a 1p(Y)h(X) dIP VB € BUR), (6.2) 
i lp(yo(yey (dy) = If Ip(y)h(z) duxy(«,y) VB € BUR), (6.3) 
IR 


rs a(y) fry) dy = [fh 2) fxy(2,y)dedy VB € BUR). (6.4) 


CHAPTER 11. General Random Variables 127 
11.7 Conditional Density 


A function fx y (aly) : IR? ~[0, 00) is called a conditional density for X given Y provided that for 
any function h : IR JR: 


a(y) = f(x) Fxyv (ely) ae. 7.1) 


(Here g is the function satisfying 
IE[A(X)|¥] = 9(¥), 


and g depends on h, but fx jy does not.) 
Theorem 7.33 /f (X,Y) has a joint density fx y, then 


fxy(%,y) 
fyy) — 


Proof: Just verify that g defined by (7.1) satisfies (6.4): For B € BUR), 


i fp t)Fxw ol) de t)fxyy (aly) dx fy (y) dy = ie [hl x) fx y (x,y) dady. 


a | as 


fxiy (ely) = (7.2) 


Notation 11.1 Let g be the function satisfying 
IE{A(X)|Y] = g(¥). 
The function g is often written as 
gy) = E[A(X)|Y = yl, 
and (7.1) becomes 
HUMXIY =o) =f ble) Fup (ely) de 


In conclusion, to determine E[h(X )|Y] (a function of w), first compute 


g(y) = ff bo) Fxyy (ely) de 
and then replace the dummy variable y by the random variable Y: 


IE[R(X)|Y](@) = 9(¥ @))- 


Example 11.1 (Jointly normal random variables) Given parameters: 7; > 0,02 > 0,-1 < p < 1. Let 
(X, Y) have the joint density 


1 £ ey ¥ 
fx,y (®,y = [2 a | hey 
(2,9) 2Qro\02\/1 — p? 2(1— p?) Lo? Poy oy OF 


128 


The exponent is 


We can compute the Marginal density of Y as follows 


2 2 
1 Sees 1, 2 (« ery) —44 
fy(y) = —— | 6 PUSRI% me dx.e 772 
(w) 2ro109\/1— p? Joo 
oo 2 = begs 
= : / eT due 2% 
2702 Joo 
ing the substituti =~ («- Sy), du= 
using the substitution u = a— Sy), du ena 
als y? 
gee ee, 
V2T oF 
Thus Y is normal with mean 0 and variance 3. 
Conditional density. From the expressions 
= ae (pba ee 
fy (x,y) = —— 2(1-p?) 2 ( oe y) € re 
2r0109\/1— p? 
1 y? 
fr(y) =e PF, 
V2T oO 
we have 
fxy (x,y) 
fxyy(ely) = = 
| fry) 
1 1 
= ! ~ 30"? dh (x - ty) 


———_ ———e 
V2m 01 \/1— p? 


In the x-variable, fx jy («|y) is a normal density with mean &2+y and variance (1 — p*)oj. Therefore, 


IE(X|Y = yl =i) afxiy (aly) dx = ae 


2 
- [(x- 20) =e 
02 


ioe) 2 
/ (« = 1.) Ixy (@|y) da 
—oo 2 


(1 p*)o7. 


CHAPTER 11. General Random Variables 129 


From the above two formulas we have the formulas 


E[x|y] = “y, (7.3) 
02 
o 2, 
E (x 7 ay) y = (1—p")o?. (74) 
o2 
Taking expectations in (7.3) and (7.4) yields 
Ex —-?' Ry -0, (7.5) 
02 
- 2, 
E (x = ay) = (1—p?)o?. (7.6) 
02 


Based on Y,, the best estimator of X is aa This estimator is unbiased (has expected error zero) and the 
expected square error is (1 — p?)o7. No other estimator based on Y can have a smaller expected square error 
(Homework problem 2.1). | 


11.8 Multivariate Normal Distribution 


Please see Oksendal Appendix A. 


Let X denote the column vector of random variables (Xj, X2,...,-X. ae and x the corresponding 
column vector of values (#1, %2,..., ea fe X has a multivariate normal distribution if and only if 
the random variables have the joint density 


{x(x} = Sy exp {4 \T.A(X = y)} : 


Here, 
4 T A T 
P= esata’ = EXHIB X ips dE XS) 


and A is ann x n nonsingular matrix. A~! is the covariance matrix 
At = B[(X-p).(X-4)"], 


ie. the (7, 7)thelement of A~! is J2(X;—j1;)(X;—j4;). The random variables in X are independent 
if and only if A~! is diagonal, i-e., 


-l_ag ae 2 
AW = ae (05205 02-405 NG 


where 0? = JE(X; — ju;)? is the variance of Xj. 


130 
11.9 Bivariate normal distribution 


Take n = 2 in the above definitions, and let 


& E(X1 ~ ta) (Xo = ba) 


p 
0102 


Thus, 
2 


Aahies | oF po 1o2 | 


2 
po ,o2 05 


~102(1—p?) o3(1—p?) 
1 

Vdet A = ———_., 

0102\/1 — p* 


and we have the formula from Example 11.1, adjusted to account for the possibly non-zero expec- 
tations: 


voy fee teers) a) 


£1, 22) = —— = Sean 
Fx, ,X2(®1, 2) Ino102/1 — pz 1 — p?) of 0102 or 


11.10 MGF of jointly normal random variables 


Teta Cte Wass oe Un)? denote a column vector with components in JR, and let X have a 
multivariate normal distribution with covariance matrix A~! and mean vector yz. Then the moment 
generating function is given by 


T ee ee T 
Ee xX = oe cab ie ee Ce eee 
a {4 T 4-1 T \ 
= pygu u+uU pe. 


If any n random variables X,, X2,...,X,, have this moment generating function, then they are 
jointly normal, and we can read out the means and covariances. The random variables are jointly 
normal and independent if and only if for any real column vector u = (w1,..., Un)? 


a A n n 
Eee ea IF exp 2 wx} = exp {Sete +m : 


j=l j=l 


Chapter 12 


Semi-Continuous Models 


12.1 Discrete-time Brownian Motion 


Let {Y;}%_, bea collection of independent, standard normal random variables defined on (Q, F, P), 
where P is the market measure. As before we denote the column vector (Y1,..., Yn)’ by Y. We 


therefore have for any real colum vector u = (w1,...,Un)/, 


Ket Y = exp {eel = exp 


j=l 


n 
12 
j=l 


Define the discrete-time Brownian motion (See Fig. 12.1): 


Bo = 90, 


k 
By Sea 
j=l 


If we know Y,, Yo,... , ¥z, then we know B,, Bz,... , By. Conversely, if we know By, Bo,... 


then we know Y, = B,, Yo = Bz — B,,...,Y, = By — B,_ 1. Define the filtration 


Fo = {¢, QJ, 
Fr, = Oise tan Ye) OCB iD eel te Lee 


Theorem 1.34 {B;,}_ is a martingale (under P). 


Proof: 

FE [BrailFs] = FE [Yesi + BelF«] 
IEYy41 + Br 
= By. 


131 


, Bg, 


132 


Figure 12.1: Discrete-time Brownian motion. 


Theorem 1.35 {B;,}{_, is a Markov process. 


Proof: Note that 
IE|A(Brai)|Fe] = E[A(Ven1 + Br) |Fx]- 


Use the Independence Lemma. Define 


g(b) = E-R(Ye41 + 8) (y + b)e —aV° dy. 


eS 


Then 
IE|A(Yeu1 + Br)|Fr] = 9 (Be), 


which is a function of B; alone. 


12.2. The Stock Price Process 


Given parameters: 


e 4 € R, the mean rate of return. 
e o > 0, the volatility. 


@ So > 0, the initial stock price. 
The stock price process is then given by 
a 1 2 = 
Si = So exp {o Be + (w- 50 kh, k=0,...,n. 


Note that 
Sro1 = Spexp {oYes1 tlie = 4o%)\, 


CHAPTER 12. Semi-Continuous Models 133 


E[SpalFe] = Spl leO%et) [Fy]. 27 
= Sper et 2” 
= ef Sp. 
Thus 
b= log Pele = log a Fi ; 
and 


var (Ios “it ) = var (ois + (pu - 4a)) =o’, 
k 


12.3. Remainder of the Market 


The other processes in the market are defined as follows. 


Money market process: 
M;, = e"*, k=0,1,...,n. 


Portfolio process: 


e Ao, Ai, es pags 


e Each A; is F;,-measurable. 
Wealth process: 


e Xo given, nonrandom. 


Xkoi = ApSypi +e” (XE — AgSs) 
Ax (S44 a é 35) + eX, 


e Each X;, is F;,-measurable. 


Discounted wealth process: 


Xkq1 _ Sh+1 Sk X; 
Mag ( uM, ) + 


12.4 Risk-Neutral Measure 


rar 12.1 Let P bea probability measure on (2, F), equivalent to the market measure P. If 


{3 M, aa is a martingale under P, we say that IP is a risk-neutral measure. 


134 


nr 


Theorem 4.36 If IP is a risk-neutral measure, then every discounted wealth process {at}, is 
=0 


a martingale under P, regardless of the portfolio process used to generate it. 


Proof: 


ae | ARS _ Sea 38K Xz 

fe ine Fi | eae [as (ae 7 3) + ui |? 
= oe | Osa Sk Xp 
: a (& [FE Fe] - fe) + te 
_ XE 


12.5 Risk-Neutral Pricing 


Let V,, be the payoff at time m, and say it is ¥,,-measurable. Note that V,, may be path-dependent. 
Hedging a short position: 


e Sell the simple European derivative security V,,. 
e Receive Xo at time 0. 
e Construct a portfolio process Ap, ... , Aj,—1 which starts with Xo and ends with X,, = Vy. 


e If there is a risk-neutral measure P, then 
= —~ Xn, a —~ Vin 
Xo = ENT. = ETF. 


Remark 12.1 Hedging in this “semi-continuous” model is usually not possible because there are 
not enough trading dates. This difficulty will disappear when we go to the fully continuous model. 


12.6 Arbitrage 

Definition 12.2 An arbitrage is a portfolio which starts with Xg = 0 and ends with X,, satisfying 
P(X, > 0)=1, P(X, > 0) > 0. 

(P here is the market measure). 


Theorem 6.37 (Fundamental Theorem of Asset Pricing: Easy part) Jf there is a risk-neutral mea- 
sure, then there is no arbitrage. 


CHAPTER 12. Semi-Continuous Models 135 


Proof: Let IP be a risk-neutral measure, let X = 0, and let X,, be the final wealth corresponding 


to any portfolio process. Since {+ i is a martingale under P, 
=0 


ap = Eqp =0. (6.1) 


Suppose JP(X,, > 0) = 1. We have 


P(X, > 0) =1 = P(X <0) =0— P(X, <0) 0 => PX, > 0) = 


(6.2) 
(6.1) and (6.2) imply IP(X,, = 0) = 1. We have 
P(X. S0) 1S POS 0) 20S] PCRS SO 

This is not an arbitrage. | 
12.7 Stalking the Risk-Neutral Measure 
Recall that 

e Y,, ¥o,..., ¥;, are independent, standard normal random variables on some probability space 

(QUF,.P). 


e S; = Spo exp {oBx +(u- to2)k}. 
e 
12 
So exp {o( Br + Yrgi) + (u- 30°) (k+ iy} 
= Spexp {oYes1 + (pu - 4a?) 


Sh4+1 


Therefore, 


Maa. tf exp {ois SoA ea ie toy}, 


Sk — Sk F : 
| a | = Me [exp {oYe+1} |Fe]exp{u — 1 — 30°} 
= tf exp{to?}. exp{u erie 
eis 
— elo? Yk 
Se 


If « = r, the market measure is risk neutral. If 4: 4 r, we must seek further. 


136 


exp {o¥ias + (ur $o?)} 


Sk 

Mr 

S =r 

= ape exp {Yi a ce 4a} 
tf exp {oF — toh, 


where 
Yeu = Yara + eo 


The quantity “— is denoted @ and is called the market price of risk. 


We want a probability measure IP under which Y,,...,Y;, are independent, standard normal ran- 
dom variables. Then we would have 
E S k+1 F 2 Sz E Y; ae _1,2 
Mace 8 = Me exp{oYe41}|Fe| -exp{—3o° } 
= aE exp{Zo"}.exp{—407} 
— Sb 
k 


Cameron-Martin-Girsanov’s Idea: Define the random variable 


Z = exp im - a : 


j=l 


Properties of 7: 


eo Z>0. 
e 
bs n 
EZ = E —OY,)}>. =o 
exp 1 | exp { 5 \ 
= exp {26° vexp {20° =1. 
Define 


P(A)= | zap WAC F. 
A 


Then IP(A) > 0 for all A € F and 
IP(Q) = EZ =1. 


In other words, IP is a probability measure. 


CHAPTER 12. Semi-Continuous Models 137 


We show that JP is a risk-neutral measure. For this, it suffices to show that 
Y,=¥,+06,...,.Y%,=Y,+0 


are independent, standard normal under P. 


Verification: 
e Y1, ¥2,...,¥,: Independent, standard normal under P, and 
IF exp DS ujY; | = exp | : = , 
j=l j=l 
eY=Y¥,4+06,...,Y%,=Y,+4+84. 


e Z > Oalmost surely. 


0 Z =exp [D7 (-0Y; - 46°)], 
P(A) = | ZdIP YAEF, 
A 


IEX = IE(XZ) for every random variable X. 


e Compute the moment generating function of (Yi, sat, Y,) under /P: 


j=l 


E exp » ujY; 


II 
Ss 
oO 
tai 
ae 
C1 


138 


12.8 Pricing a European Call 


Stock price at time 7 is 


Sn 


Soexp {oBn + (u 


So exp 


j=l 


oD UY) + (w— 307)n 


— 4o?)n} 


Payoff at time n is (5, — K)*. Price at time zero is 


—_ k)\t+ 
pon Kk) 


M, 


E -~ (s exp 


ene (So exp {ob + (r - to?)n} — al : 


CO 


i 


CO 


oy. Y; + (r —to 


j=l 


say 


1 
e 2? db 
VJ 2Tn 


since ae Y; is normal with mean 0, variance n, under JP. 


This is the Black-Scholes price. It does not depend on pi. 


Chapter 13 


Brownian Motion 


13.1 Symmetric Random Walk 


Toss a fair coin infinitely many times. Define 


Set 


13.2 The Law of Large Numbers 


We will use the method of moment generating functions to derive the Law of Large Numbers: 


Theorem 2.38 (Law of Large Numbers:) 


1 
Zan almost surely, as k—- oo. 


139 


140 


Proof: 


k 
u 
=F —X; Def. of M,. 
exp eS Pa (Def. o ke) 
u 
= II IF exp {ex i} (Independence of the X ;’s) 
j=l 


log (feu" + deur) 


5 
a 
oe 
6 
>= 
SS 
| 
5 


UL Ub pT UL 


Z (L’H6pital’s Rule) 


_ 5 
lim a 
s—0 so + ao 


lI 
= 


Therefore, 
lim yp(u) = e° = 1, 


which is the m.g.f. for the constant 0. 


13.3. Central Limit Theorem 


We use the method of moment generating functions to prove the Central Limit Theorem. 


Theorem 3.39 (Central Limit Theorem) 


1 
—wM),,- Standard normal, as k- 00. 


Vk 


Proof: 


CHAPTER 13. Brownian Motion 


so that, 
I = Lege a det ae 
og yr(u) = k log (tee + 5e vr ) : 


Let « = --. Then 


a 


lim log o,(u) = lim 
ae as (w) z—0 lis 
Upur _ Up—ur 
% 2 2 TA ; 
= lim (L’ H6pital’s Rule) 
x20 De (Jerr + Le-us | 
2 
; 1 Ber = gem 
= lim 
x20 seu + seo x20 24 
Ub W,UL Us Ue 
_ set? — Ze 
= lim 2 2 
2—o0 22 
we pur _ we pur 
= lim 4 Z 
x2—o0 2 
tain 
=ou. 
Therefore, 
‘ 12 
im plu) = e2 
ao cy) : 


which is the m.g.f. for a standard normal random variable. 


13.4 Brownian Motion as a Limit of Random Walks 


Let n be a positive integer. If > 0 is of the form - then set 


If t > 0 is not of the form 4 then define B\™) (t) by linear interpolation (See Fig. 13.1). 


Here are some properties of B()(t): 


141 


(L’H6pital’s Rule) 


142 


kin (k+1)/n 


Figure 13.1: Linear Interpolation to define B (t). 


Properties of —B°°)(1) : 


BU10) (1) = a5 X; (Approximately normal) 
j=l 
1 200 
EB) (1) = 79 2 EX; = 0. 
j=l 
1 109 
var(BU)(1)) = 0A var(X;) = 1 
j=l 
Properties of —B°°)(2) : 
1 209 
BU199) (2) =a X; (Approximately normal) 
j=l 


Also note that: 


‘ B(109) (1) and B(109) (2) = BUOY are independent. 


e B(°°)(¢) is a continuous function of t. 


To get Brownian motion, let noo in B™ (t),  t > 0. 


13.5 Brownian Motion 


(Please refer to Oksendal, Chapter 2.) 


CHAPTER 13. Brownian Motion 143 


B(t) = B(to) 


(Q2, F,P) 
Figure 13.2: Continuous-time Brownian Motion. 


A random variable B(t) (see Fig. 13.2) is called a Brownian Motion if it satisfies the following 
properties: 

1. B(O) =0, 

2. B(t) is a continuous function of t; 

3. B has independent, normally distributed increments: If 


O=tp < ty <tg<...<t, 


and 
Y, = B(ti) — B(to), Yo = B(tz)-— B(th), ... Yn = Bltn) — Bltn-1), 
then 
e Y1,¥o,..., Y;, are independent, 
e KY; =0 Vy, 
e var(Y;) =t; — tj-1 Vj. 


13.6 Covariance of Brownian Motion 


Let 0 < s < ¢ be given. Then B(s) and B(t) — B(s) are independent, so B(s) and B(t) = 


(B(t) — B(s)) + B(s) are jointly normal. Moreover, 
IEB(s) =0, var (.B(s) 
EB(t) =0, var (B(t) 
EB(s)B(t) = EB(s)(BW) — Ble) 
= EB(s)(B(t) — B(s)) + EB») 


0 8 


8, 
a 


lI 
a 


144 


Thus for any s > 0, ¢ > 0 (not necessarily s < ¢), we have 


IEEB(s)B(t) = s At. 


13.7 Finite-Dimensional Distributions of Brownian Motion 


Let 
0O<t)<to<...<t, 


be given. Then 
(B(t1), B(t2),..., B(tn)) 


is jointly normal with covariance matrix 


IEB*(t,) JE B(t,)B(t2) ... IEB(t1)B(tn) 
CH IEB(tz)B(ty) = EB? (tz) iw... FE B( tz) Btn) 
; ae ; : pers semisawes ae Bs 
ty ty oe ty 
= ty tg wae tg 
oo A onan : 


13.8 Filtration generated by a Brownian Motion 


{F(t) fe>0 

Required properties: 

e For each t, B(t) is ¥(t)-measurable, 

e For each ¢ and fort < ty < tg <---< ¢,, the Brownian motion increments 

B(ti) - Bit), B(tz)- Bit), ..., Blin) — B(tn-1) 
are independent of F(t). 
Here is one way to construct F(t). First fix t. Let s € [0,t] and C’ € BUR) be given. Put the set 
{B(s) € C} = {w: B(s,w) € C} 


in F(t). Do this for all possible numbers s € [0,¢] and C’ € B(JR). Then put in every other set 
required by the o-algebra properties. 


This F(¢) contains exactly the information learned by observing the Brownian motion upto time f. 
{ F(t) }e>o is called the filtration generated by the Brownian motion. 


CHAPTER 13. Brownian Motion 145 
13.9 Martingale Property 


Theorem 9.40 Brownian motion is a martingale. 


Proof: Let 0 < s < ¢ be given. Then 


= B(s) 
a 
Theorem 9.41 Let @ € IR be given. Then 
Z(t) = exp {—-OB(t) — 46°t} 
is a martingale. 
Proof: Let 0 < s < ¢ be given. Then 
E[Z(0)|F(8)] = # [exp{—-(B(0) — BOs) + BIs)) ~ HU 5) +9} FOO) 
=E Zs) exp{—6(B(t) — B(s)) — 40°(t - ro] 
= Z(s)IE \exp{—0(B(t) — B(s)) — 4(t — )}] 
= Z(s) exp { 4(—0)? var(B(t) — B(s)) — $0°(t- )} 
IN) 
a 


13.10 The Limit of a Binomial Model 


Consider the n’th Binomial model with the following parameters: 


eu,=—1+ ee “Up” factor. (o > 0). 
ed,-—1- oe “Down” factor. 


er-—0. 


146 


Let {,(H) denote the number of H in the first & tosses, and let 1;,(7') denote the number of 7’ in the 
first & tosses. Then 


te) + Bn(T) = k, 
th(H) — (7) = Mr, 
which implies, 
te(H) = 3(k + Mr) 
te(T) = 3(& — Mr) 


In the n’th model, take n steps per unit time. Set s®) =1.Lett= & for some k, and let 
SM) = (1 + <) ee) (1 * <) atic 
vn vn 
Under JP, the price process $(") is a martingale. 
Theorem 10.42 As n->00, the distribution of S (7) (t) converges to the distribution of 
exp{oB(t) — 40°t}, 
where B is a Brownian motion. Note that the correction —tort is necessary in order to have a 
martingale. 
Proof: Recall that from the Taylor series we have 
log(1 +2) = 2-427 +O(2°*), 
sO 


log S(™(t) = L(nt + Mnt) log(1 4 


=nt (3 log(1 + =) + 5 log(1 — <)) 


oO 
+ Mn (3 log(1 + a = opt 


n 
—<_ —< 
Bi 70 


As n—+00, the distribution of log $) (t) approaches the distribution of oB(t) — 407t. a 


CHAPTER 13. Brownian Motion 147 


B(t) = B(to) 


mat 


. ¥ Tv 


(Q,F, P*) 


Figure 13.3: Continuous-time Brownian Motion, starting at x # 0. 
13.11 Starting at Points Other Than 0 


(The remaining sections in this chapter were taught Dec 7.) 


For a Brownian motion B(t) that starts at 0, we have: 


P(B(0) = 0) =1. 


For a Brownian motion B(f) that starts at , denote the corresponding probability measure by IP” 
(See Fig. 13.3), and for such a Brownian motion we have: 


IPB (0) = 9) 1 
Note that: 


e If« #0, then P* puts all its probability on a completely different set from P. 
e The distribution of B(t) under JP* is the same as the distribution of z + B(t) under P. 


13.12 Markov Property for Brownian Motion 


We prove that 


Theorem 12.43 Brownian motion has the Markov property. 


Proof: 
Lets > 0, t > 0 be given (See Fig. 13.4). 


# [na(s+9)]r19) = |h( Bist t)~Bis)+ Bis) ) Fs) 


Independent of F (s) F¥ (s)-measurable 


148 


restart 


Figure 13.4: Markov Property of Brownian Motion. 


Use the Independence Lemma. Define 


g(z) = E[h( B(s +t) — B(s) + «)] 


| (ax B(t) ) 
ee 
same distribution as B(s + ¢) — B(s) 
— Eh(B(t)). 


Then 


In fact Brownian motion has the strong Markov property. 


Example 13.1 (Strong Markov Property) See Fig. 13.5. Fix x > 0 and define 
r=min{t>0; B(t)=c}. 


Then we have: 


CHAPTER 13. Brownian Motion 149 


restart 


Figure 13.5: Strong Markov Property of Brownian Motion. 


13.13 Transition Density 


Let p(t, 7, y) be the probability that the Brownian motion changes value from « to y in time t, and 
let 7 be defined as in the previous section. 


1 _ wea)? 
p(t, @,y) = errs 
g(v) = E*h( BO) = f h(u)plt, 2,4) ay. 


13.14 First Passage Time 


Fix « > 0. Define 
T=min{t>0; B(t)=c}. 


Fix 6 > 0. Then 
192 
exp {eB(t AT) — 50 (tA r)} 


is a martingale, and 


exp {OB(LA 7) —40°(tA r)} ae 


150 


We have 
ap aS 
lim exp;—40?(tAT) > = aie (14.1) 
{ ; j 0 ifr — ox, 
0 < exp{@B(t Ar) — $0?(tAT)} < e®. 
Let {00 in (14.1), using the Bounded Convergence Theorem, to get 
E lexp {Oe - 467} ¢,<00}| =1. 
Let 610 to get IE1,,<..} = 1, 80 
IPL SoG} I 
Eexp{-$0°r} = ae (14.2) 
Let a = 467. We have the m.g.f.: 
Ee7? = e7*¥?27, a 3 0. (14.3) 
Differentiation of (14.3) w.r.t. a yields 
—aT & —a2V/2a 
—IE |re = ——— 
[re] om 
Letting «0, we obtain 
TET = ow. (14.4) 


Conclusion. Brownian motion reaches level x with probability 1. The expected time to reach level 
& is infinite. 


We use the Reflection Principle below (see Fig. 13.6). 


IPse < t,) Bt) <e@)SdP{ Bit). Se} 
Pi{r <t}= P{r <t, Bit) < «}+ P{r <t, Bit) > x} 
= P{B(t) > «}+ P{B(t) > z} 
= 2IP{B(t) > x} 


2 / a 
— € 2t 
V2nt a 


CHAPTER 13. Brownian Motion 151 


shadow path 


Tis ate motion 


Figure 13.6: Reflection Principle in Brownian Motion. 


Using the substitution z = Si dz= 4 we get 


Density: 


0 4 Of x? 
— —/P < = Ot 
fr (t) Ot {T _ t} x3 ’ 


which follows from the fact that if 


then 


Laplace transform formula: 


152 


Chapter 14 


The Ito Integral 


The following chapters deal with Stochastic Differential Equations in Finance. References: 


1. B. Oksendal, Stochastic Differential Equations, Springer-Verlag, 1995 


2. J. Hull, Options, Futures and other Derivative Securities, Prentice Hall, 1993. 


14.1 Brownian Motion 


(See Fig. 13.3.) (Q, F, P) is given, always in the background, even when not explicitly mentioned. 
Brownian motion, B(t, w) : (0,00) x QR, has the following properties: 


1. B(O) = 0; Technically, P{w; B(0,w) = 0} =1, 
2. B(t) isa continuous function of ¢, 


3. fO=to9 < ty <...<t,, then the increments 
B(t1) — B(to), ..., Btn) — Bltn_-1) 


are independent,normal, and 


14.2 First Variation 


Quadratic variation is a measure of volatility. First we will consider first variation, FV (f), of a 
function f(t). 


153 


154 


fa 


Figure 14.1: Example function f(t). 


For the function pictured in Fig. 14.1, the first variation over the interval [0, 7'] is given by: 


FVo.r\(f) = [f(t1) — F(0)] — [F (ta) — FU) + FP) — Ft2)] 


Thus, first variation measures the total amount of up and down motion of the path. 


The general definition of first variation is as follows: 
Definition 14.1 (First Variation) Let II = {to,t,,... ,t,} be a partition of [0, T], i.e 
O27 SS SE 
The mesh of the partition is defined to be 
NN eg eg eee ibe) 
We then define 
FVion(f) = )= lit, [F(ti+1) ~ F(t) 


Suppose f is differentiable. Then the Mean Value Theorem implies that in each subinterval [t;,, t+], 
there is a point ¢7, such that 


F(tusi) — F(te) = FER) tata — te). 


CHAPTER 14. The Ité Integral 155 


Then = - 
do lf Ger) — FG) = 0 FED ata — te), 
k=0 k=0 


and 


FVom(f) = , lim ar t7)|(te41 — tr) 


14.3 Quadratic Variation 
Definition 14.2 (Quadratic Variation) The quadratic variation of a function f on an interval [0, 7] 
is 


(ME) = im, Wittman) — FIP, 
k=0 


Remark 14.1 (Quadratic Variation of Differentiable Functions) If f is differentiable, then (f) (7) = 
0, because 


n-1 
Seale => lf’)? thar — te)? 
k=0 


< ||IT]. is LF (4) |? (tear — te) 


and 
n-1 
Ty oN T||. i Nag ak 
a tts ll | ho De IF (tia = 19) 
II t)|? dt 
= dim, ifire (| 
=0. 
Theorem 3.44 
(B)(T) =T, 


or more precisely, 


P{w € 0; (B(,w)\(T) =T}=1. 


In particular, the paths of Brownian motion are not differentiable. 


156 


Proof: (Outline) Let I] = {to,t,,...,t,} be a partition of [0,7]. To simplify notation, set D, = 
B(t,41) — B(t,). Define the sample quadratic variation 


n-1 
ee ae ore 
k=0 
Then : 
On= f=) [Pi = Gas). 
k=0 
We want to show that 
Qn —-T) =0. 
it of o 


Consider an individual summand 


Di, — (tes — th) = [B(tag1) — B(ta))? — (tega — th): 
This has expectation 0, so 


n-1 
E(Qu-T) = E S_[Dj - (try — te)] = 0. 
k=0 
For j # k, the terms 
Dj (tj41 — tj) and Dj — (tiga — te) 
are independent, so 


n—-1 
var(Qn — T) = i var[D? — (trai — te)] 
k=0 
n—-1 
= \0 ELD} - 2(tea1 — te) DZ + (tei — te)? 
k=0 


= -¥p (teoa — te)? — (thar — th)? + (thor — th)? 


a X is normal with mean 0 and variance o”, then IE(X*) = 30%) 


n-1 
= 250 (tepi - th)? 
k=0 


n—-1 

< QI] $7 tes — te) 
k=0 

= 2||I1|| 7. 


Thus we have 


ORAS. 
var(Qn — 7) < 2\|Il|.7. 


CHAPTER 14. The It6é Integral 157 


As ||II||0, var(Qn — T)—0, so 


r Spey 
it on — 7) 


Remark 14.2 (Differential Representation) We know that 
E[(B(tk+1) — B(ée))? — (tha — th)] = 0- 
We showed above that 
var[(B(ts41) — B(ée))? — (tha — th)] = 241 — te)? 
When (t,41 — tx) is small, (t,41 — ty)? is very small, and we have the approximate equation 
(B(tea1) — Blt)? & tea — tes 
which we can write informally as 


dB(t) dB(t) = dt. 


14.4 Quadratic Variation as Absolute Volatility 


On any time interval [7,7], we can sample the Brownian motion at times 
Tyate < ty S. as Sp aT 


and compute the squared sample absolute volatility 


1 n—-1 
B(te41) — B(te))? 
TaF 2 (Bltess) ~ Bets) 
This is approximately equal to 
1 LAPT 


ee ne) 


As we increase the number of sample points, this approximation becomes exact. In other words, 
Brownian motion has absolute volatility 1. 


Furthermore, consider the equation 
T 
(BY(T) = ae dt, VT>0. 
0 


This says that quadratic variation for Brownian motion accumulates at rate 1 at all times along 
almost every path. 


158 
14.5 Construction of the Ito Integral 


The integrator is Brownian motion B(t),t > 0, with associated filtration F(t),t > 0, and the 
following properties: 


1. s < t==> every set in F(s) is also in F(t), 
2. B(t) is F(t)-measurable, Vt, 


3. Fort <ty <...< tn, the increments B(t,) — B(t), B(t2) — B(ti),..., Bltn) — B(tn-1) 
are independent of F(t). 


The integrand is 6(t),t > 0, where 


1. d(t) is F(t)-measurable Vt (i.e., 6 is adapted) 


2. 6 is square-integrable: 
sh 


Ef 8%) dix-os, VT 
0 


We want to define the Ité Integral: 
t 
I(t) = p §(u) dB(u),  t > 0. 
0 


Remark 14.3 (Integral w.r.t. a differentiable function) If f(¢) is a differentiable function, then 
we can define 


| Hu) dl) = f(a) Fw) du 


This won’t work when the integrator is Brownian motion, because the paths of Brownian motion 
are not differentiable. 


14.6 It6 integral of an elementary integrand 
Let II = {to, t1,... , tn} be a partition of [0, 7], ie., 
C= he See, 


Assume that 6(f) is constant on each subinterval [t,,t,41] (see Fig. 14.2). We call such a é an 
elementary process. 


The functions B(t) and 6(t,) can be interpreted as follows: 


e Think of B(t) as the price per unit share of an asset at time t. 


CHAPTER 14. The Ité Integral 159 


o(t)=5 t 

pecs 8( t )= 8(t3) 
(1) = (19) e——_—_© 
qo oo 


= t t t t= 
5(t) = O(t5) 


Figure 14.2: An elementary function 6. 


e Think of to, t1,... ,¢, as the trading dates for the asset. 


e Think of 5(t;,) as the number of shares of the asset acquired at trading date t;, and held until 
trading date t;41. 


Then the It6 integral /(¢) can be interpreted as the gain from trading at time t; this gain is given by: 


5(to) (Bt) — Blto) } 0<t<t 
ie =B(0)=0 
d(to)[Blt) — B(to)] + (4) [B® — Bl )), th <t<t 
d(to)[B(t1) — B(to)] + 4(t1)[B(t2) — B(ti)] + o(t2)[B() — Blta)], to St < ts. 
In general, ift, < ¢ < th41, 
k-1 
I(t) = 2, S(t) Bs) — B(t;)|+ ote) [BO — Btx)]. 


14.7 Properties of the It6 integral of an elementary process 


Adaptedness For each t, /(t) is ¥ ()-measurable. 


Linearity If 


then 


160 


tk ‘k+l 


Figure 14.3: Showing s and t in different partitions. 


and 


Martingale /(t) is a martingale. 


We prove the martingale property for the elementary process case. 


Theorem 7.45 (Martingale Property) 
T(t) = D7 5(t;)[B(ti+1) — B(t)] + 6) [BO - Bli)], te StS tet 


is a martingale. 


Proof: Let 0 < s < ¢ be given. We treat the more difficult case that s and ¢ are in different 
subintervals, i.e., there are partition points tg and t;, such that s € [te,te4,] and t € [tz, tp41] (See 
Fig. 14.3). 


Write 
e-1 
I(t) = > (ts) [B) — B(t;)] + ote) [B(tey1) — B(te)] 
k-1 
+ $0 d(t;)[Bti41) — B()] + 6(t) [BO - Bte)] 
g=l+l1 


We compute conditional expectations: 


£-1 


= D2 5(t;)(Blti41) — Blt). 


= 6(t)[B(s) — B(te)] 


CHAPTER 14. The It6é Integral 161 


These first two terms add up to /(s). We show that the third and fourth terms are zero. 


k-1 
E | Y 5tt,)(Bltsgs) — BE) FC) 
g=l+1 


B [s(tx)( Be) - BC) |F(6)] = B |5(4) (ELBO FC) — Beta) FUe) 


=0 
a 
Theorem 7.46 (It6 Isometry) 
i 
EP()=E | 5?(u) du. 
0 
Proof: To simplify notation, assume ¢ = t;,, so 
k 
T(t) = 97 dt) [Blti+a) — B(t,)] 
— 
j=0 D, 
Each D has expectation 0, and different D; are independent. 
2 
Pa) = (3: stn, 
j=0 
k 
= S° 0 (t;)D? + 25° 4(t))5(t;) D:D; 
7=0 t<J 
Since the cross terms have expectation zero, 
k 
EP (Ot) = >) ES (tj) D3] 
j=0 
k 
= 8 [ee [Bt - Ben? |Fe)]] 
j=0 
k 
= >) B® (ts) (ti41 — &) 
j=0 
ph ttl 
SE is 6°(u) du 
j=0 ts 


162 


van oF 5, path of 6 


Figure 14.4: Approximating a general process by an elementary process 64, over [0, T}. 


14.8 It6 integral of a general integrand 
Fix T > 0. Let 6 be a process (not necessarily an elementary process) such that 


e 5(t) is F(t)-measurable, Vt € [0,7], 
e IE {i &(t) dt < ox. 
Theorem 8.47 There is a sequence of elementary processes On oy such that 
T 


jim IE : [5,(t) — d(t)|? dt = 0. 


Proof: Fig. 14.4 shows the main idea. 


In the last section we have defined 


for every n. We now define 


CHAPTER 14. The Ité Integral 163 
The only difficulty with this approach is that we need to make sure the above limit exists. Suppose 
n and m are large positive integers. Then 


T 


var(In(T) — Im(T)) = E ( | [5n(t) — Sm (t)] ‘w(o) 


i 
(It6 Isometry:) = r | [5,,(t) — 5m (t)]* dt 
0 


va 
= 6 | [ 16,.(t) — 5()| + |8(t) — dn. (t)| |? dt 
T T 
((a-+5)? < 2a? +20?) < 2p \5,(t) — 6(t)|? a+ 26 | I5n(t) — 6(t)|2 at, 


which is small. This guarantees that the sequence {/,,(7') }°2, has a limit. 


14.9 Properties of the (general) It6 integral 


Here 6 is any adapted, square-integrable process. 


Adaptedness. For each t, /(t) is F(t)-measurable. 


Linearity. If 


then 


and 


Martingale. /(¢) is a martingale. 
Continuity. /(t) is a continuous function of the upper limit of integration f. 
It6 Isometry. /I?(t) = IE {5 6?(u) du. 


Example 14.1 () Consider the It6 integral 


ie B(u) dB(u). 


We approximate the integrand as shown in Fig. 14.5 


164 


Figure 14.5: Approximating the integrand B(u) with 64, over (0, T). 


B(0) =0 if O<u<T/n; 


5n(u) eS B(T/n) if T/n < Uu< OT fx; 


B(SsM2) it SM cuer 


By definition, 


fast So) fo (HEM) (2) 


To simplify notation, we denote 


sO 
n-1 


a B(u) dB(u) = lim Du (Bri — Br). 


=0 


We compute 


n-1 n-1 n-1 n-1 
£5 (Boes — Ba)? = 45. Blas 5 BaP +45. BE 
Z| k=0 k=0 k=0 
n-1 n-1 n-1 
=4B,+4)> BF - So BeBesit 4 S— Be 
j=0 k=0 k=0 
n-1 n-1 


n-1 
= $B2 — S> By (Br4i — Bp). 
k=0 


CHAPTER 14. The Ité Integral 165 


Therefore, 


bole 
& 
cg 
+ 
a 
| 
& 
cs 
“ee 


n-1 
S> Bu(Bryi — Be) = $BR - 
k=0 


or equivalently 


3 (#2) [a (@2B2) - 2 (H)] -ise-a5 [9 (AM) (A). 


Let n—oo and use the definition of quadratic variation to get 


[ B(u) dB(u) = $B?(T) — $T. 


Remark 14.4 (Reason for the $T term) If f is differentiable with f(0) = 0, then 


[ seorn au 


T 


T 
OO 
= 7f*(u) 


In contrast, for Brownian motion, we have 


ca 
i B(u)dB(u) = 3B?(T) — 29. 


The extra term 4T comes from the nonzero quadratic variation of Brownian motion. It has to be 
there, because 


T 
r | B(u) dB(u) = 0 (It6 integral is a martingale) 
0 


but 
ESB? (T) = 4T. 


14.10 Quadratic variation of an It6 integral 
Theorem 10.48 (Quadratic variation of It6 integral) Let 


I(t) = [ow dB(u). 


Then 


166 


This holds even if 6 is not an elementary process. The quadratic variation formula says that at each 
time u, the instantaneous absolute volatility of I is 5?(u). This is the absolute volatility of the 
Brownian motion scaled by the size of the position (i.e. 6(¢)) in the Brownian motion. Informally, 
we can write the quadratic variation formula in differential form as follows: 


dI(t) dI(t) = 6°(t) dt. 


Compare this with 
dB(t) dB(t) = dt. 


Proof: (For an elementary process 6). Let I] = {to,t1,... ,£,} be the partition for 6, i.e., d(t) = 
d(t,) for t, <t < t,41. To simplify notation, assume t = t,,. We have 
n-1 
(1)(t) = D7 CZ) (tea) — 2) a) 
k=0 
Let us compute (1) (tx41) — (/)(t,). Let = = {so, 51,... , Sm} be a partition 


th = $9 $1 S... <5 8m = that. 


Then 


SO 


It follows that 


n-1 


(1) (t) = D7 8° (th) (tha — te) 


k=0 


k=0 th 


t 
|| ||30 5?(u) du. 
3 0 


Chapter 15 


Ito’s Formula 


15.1 It6’s formula for one Brownian motion 


We want a rule to “differentiate” expressions of the form f(B(t)), where f(a) is a differentiable 


function. If B(t) were also differentiable, then the ordinary chain rule would give 


d / / 
ql (BO) = FBO)BO, 


which could be written in differential notation as 


However, B(¢) is not differentiable, and in particular has nonzero quadratic variation, so the correct 


formula has an extra term, namely, 


df(B(t)) = at, 


f'(B) dB) + 5 f"(BO) 
dB(t) dB(t) 


This is [t6’s formula in differential form. Integrating this, we obtain /t6’s formula in integral form: 


f(B(0) - (BO) = fs) dB +E | PB) au. 
F(0) 


Remark 15.1 (Differential vs. Integral Forms) The mathematically meaningful form of It6’s for- 


mula is It6’s formula in integral form: 


f(B(0) - 180) = f 1B) dB +4 [PB au. 


167 


168 
This is because we have solid definitions for both integrals appearing on the right-hand side. The 
first, 


[ Bw) eB 


is an /t6 integral, defined in the previous chapter. The second, 


[reew) tu 


is a Riemann integral, the type used in freshman calculus. 


For paper and pencil computations, the more convenient form of It6’s rule is /t6’s formula in differ- 
ential form: 


df (B(t)) = f'(BO) dBW) + sf" (Bi) at. 


There is an intuitive meaning but no solid definition for the terms df (B(t)),dB(t) and dt appearing 
in this formula. This formula becomes mathematically respectable only after we integrate it. 


15.2 Derivation of It0’s formula 
Consider f(x) = $27, so that 
P(e) Sarre f(a) Ss 


Let «,%, £41 be numbers. Taylor’s formula implies 


fGen) =f (ee) = (Gen S22) fe 5 een = oe) fe). 


In this case, Taylor’s formula to second order is exact because f is a quadratic function. 


In the general case, the above equation is only approximate, and the error is of the order of (% 4,44 — 
v},)°. The total error will have limit zero in the last step of the following argument. 


Fix T > 0 and let H = {to,t1,... ,t,} be a partition of [0, 7]. Using Taylor’s formula, we write: 


= oy [f(B(tet1)) -— f(B(te))] 


= y [B(te+1) — B(te)] f(Blte)) + by [B(te+1) — Bite) ]” f"(Bltx)) 
k=0 k=0 


n—-1 


= SS B(ts) [B(tit1) — Btx)] + 3 D0 [Bltey1) — BOR)’ 


CHAPTER 15. It6’s Formula 169 


We let ||II||—0 to obtain 


T 
ABD) — (BO) = f° Bey aB(a) +4 (BY) 


T : T 
=) f'(B(u)) aB(u) +4 | f"(B(u)) du. 


This is [t6’s formula in integral form for the special case 


fe) = 4a? 


15.3. Geometric Brownian motion 


Definition 15.1 (Geometric Brownian Motion) Geometric Brownian motion is 
S(t) = S(0) exp {oB(t) + (u — 4a?) r} ; 


where yz and g > 0 are constant. 


Define 
f(t,2) = S(0) exp {ow + (u = 40?) t : 
S(t) = f(t, BO). 
Then 


PS sae hie eh Jee Oe 
According to It6’s formula, 


dS(t) = df(t, B(t)) 
= fidt+ frdB+ 4 frre dBdB 
dt 
=("— 50°) fdt+of dB+4o’f dt 
= pS(t)dt + 0 S(t) dB(t) 


Thus, Geometric Brownian motion in differential form is 
dS(t) = wS(t)dt + o S(t) dB(t), 


and Geometric Brownian motion in integral form is 


on ie s(0)+ [stu aut [ oS(u) dB(u). 


170 
15.4 Quadratic variation of geometric Brownian motion 
In the integral form of Geometric Brownian motion, 
i i 
S(t) = S(0) +/ pS (u) aut | aS(u) dB(u), 
0 0 

the Riemann integral 

i 

F(t) = ik uS (u) de 

0 

is differentiable with F”(t) = jS(t). This term has zero quadratic variation. The It6 integral 
i 
G(t) = | oS(u) dB(u) 
0 

is not differentiable. It has quadratic variation 

i 

(G)(t) = i. 02S?(u) du. 

0 
Thus the quadratic variation of S is given by the quadratic variation of G'. In differential notation, 
we write 

dS(t) dS(t) = (uS(t)dt + oS(t)dB(t))? = 07 S7(t) dt 

15.5 Volatility of Geometric Brownian motion 


Fix 0 < Ty < 7). Let Il = {to,...,t,} be a partition of [7,7]. The squared absolute sample 
volatility of S on [T;, T3] is 


As T, | 7), the above approximation becomes exact. In other words, the instantaneous relative 
volatility of S is 07. This is usually called simply the volatility of S. 


15.6 First derivation of the Black-Scholes formula 


Wealth of an investor. An investor begins with nonrandom initial wealth Xo and at each time t, 
holds A(t) shares of stock. Stock is modelled by a geometric Brownian motion: 


dS(t) = pS(t)dt + o$(t)dB(t). 


CHAPTER 15. It6’s Formula 171 


A(t) can be random, but must be adapted. The investor finances his investing by borrowing or 
lending at interest rate r. 


Let X (tf) denote the wealth of the investor at time t. Then 
dX (t) = A(t)dS(t) + r |X (t) — A(t). S(£)] dt 
= A(t) [wS(t)dt + oS(t)dB(t)] +r (X(t) — A(t)S(£)] dt 
=rX(t)dt + A(t)S(t) (u—r) dt + A(t)S(tH)odB(t). 
—— 
Risk premium 
Value of an option. Consider an European option which pays g(.5(7’)) at time 7’. Let v(t, 2) denote 
the value of this option at time ¢ if the stock price is S(t) = x. In other words, the value of the 
option at each time t € [0, 7] is 
v(t, S(#)). 
The differential of this value is 
du(t, S(t)) = vdt + vedS + $0zedS dS 
= udt + vz [mS dt+oS dB] + $020 5” dt 
= le + Sv, + 4075700] dt + o0Sv,dB 


A hedging portfolio starts with some initial wealth Xo and invests so that the wealth X (¢) at each 
time tracks u(t, S(¢)). We saw above that 


dX (t) = [rX + A(u—r)S] dt+o0SAdB. 


To ensure that X(t) = v(t, S(£)) for all t, we equate coefficients in their differentials. Equating the 
dB coefficients, we obtain the A-hedging rule: 


A(t) = volt, S(t). 
Equating the dt coefficients, we obtain: 
ve + Sv, + BOS Un =rX+A(u-r)S. 
But we have set A = v,,, and we are seeking to cause X to agree with v. Making these substitutions, 
we obtain 
ve + Sve + $078 Ure =rutve(u—r)S, 
(where v = v(t, S(t)) and S = S(£)) which simplifies to 
vu + rSv, + ao Oe: =rv. 
In conclusion, we should let v be the solution to the Black-Scholes partial differential equation 
u;(t, ) + ravy(t, 2) + $0727 ven (t, 2) = rv(t, 2) 
satisfying the terminal condition 
o(T, x) = g(x). 
If an investor starts with Xo = v(0,5(0)) and uses the hedge A(t) = v,(t, S(¢)), then he will have 
X(t) = v(t, S(€)) for all ¢, and in particular, X(T) = g(S(T)). 


172 
15.7. Mean and variance of the Cox-Ingersoll-Ross process 


The Cox-Ingersoll-Ross model for interest rates is 


dr(t) = a(b — er(t))dt + oy/r(t) dB(t), 


where a, b, c, 7 and r(0) are positive constants. In integral form, this equation is 


r(é) = r(0) + af ( ~ er(u)) du of [r(u) dB(u). 


We apply It6’s formula to compute dr?(t). This is df(r(t)), where f(z) = 27. We obtain 
dr*(t) = df(r(i)) 
= f(r) ar + gf") arW ar) 
= 2r(t) |a(b— er(t)) dt + oy/r(t) aB(o| + lato — cr(t)) dt + oy/r(t) aB(0| 


= 2abr(t) dt — 2aer?(t) dt + 2or2(t) dB(t) + o?r(t) dt 
= (2ab + o”)r(t) dt — 2acr2(t) dt + 2or?(t) dB(t) 


The mean of r(¢). The integral form of the CIR equation is 


r(t) = r(0) + af ~ er(u)) dat of /r(u) dB(u). 


Taking expectations and remembering that the expectation of an Ité integral is zero, we obtain 
i 
Er(t) = r(0) + af (b— cEr(u)) du. 
0 


Differentiation yields 


d 
Gert) = a(b—clEr(t)) = ab—ackEr(t), 


which implies that 


d act = act d — pact 
a le er (t)] =¢ lactt(0 + TiEr(t) = er" ab. 


Integration yields 


b 
e** Er(t) — r(0) = ab | e* du = —(e** — 1). 
(e) c€ 


We solve for Fr (t): 


Er(t)= ° +e* (+00 — -) : 


é 

If r(0) = 2, then Er(t) = © for every t. If r(0) # 4, then r(t) exhibits mean reversion: 
b 
lim Er(t) = -. 
too Cc 


CHAPTER 15. It6’s Formula 173 


Variance of r(t). The integral form of the equation derived earlier for dr?(t) is 


r(u) du — 2ae f r?(u) dut+ 20 fr Fw dB(u). 


t 


P= 0) +04 *) | 


Taking expectations, we obtain 
t i 
IEr?(t) = r?(0) + (2ab+ 0%) [ EEr(u) du — 2ac [ IEr?(u) du. 
) 0 


Differentiation yields 
d 
Ger) = (2ab4+ 0°) Fr(t) — 2acEr?(t), 
which implies that 
d d 
0 E(t) Syne [pacitr(0 + TEr(0) 
= 7 (2ab + 07 \Er(t). 


Using the formula already derived for ’r(t) and integrating the last equation, after considerable 
algebra we obtain 


15.8 Multidimensional Brownian Motion 


Definition 15.2 (d-dimensional Brownian Motion) A d-dimensional Brownian Motion is a pro- 
cess 


Bit) (BiG ie Balt) 
with the following properties: 
e Each 8;,(t) is a one-dimensional Brownian motion; 


e Ifz #7, then the processes B;(t) and B;(t) are independent. 
Associated with a d-dimensional Brownian motion, we have a filtration {7 (¢) } such that 


e For each t, the random vector B(t) is ¥(t)-measurable; 
e Foreacht <t, <...< t#,, the vector increments 
B(t,) — B(t),..., B(tn) — Bltn-1) 
are independent of F(t). 


174 
15.9 Cross-variations of Brownian motions 


Because each component 5; is a one-dimensional Brownian motion, we have the informal equation 
dB; (t) dB; (t) = dt. 
However, we have: 


Theorem 9.49 [fi + J, 
dB; (t) dB; (t) =0 


Proof: Let Il = {to,... ,t,} bea partition of [0,7]. For i 4 7, define the sample cross variation 
of B; and B; on [0, 7] to be 


n-1 


Ch = >- [Bi (teri) — Bi (te)] [By (e421) — By (te) - 
k=0 


The increments appearing on the right-hand side of the above equation are all independent of one 
another and all have mean zero. Therefore, 


IECY = 0. 
We compute var(Cj;). First note that 
n-1 2 2 
ie [Biltnas) = Bilts) B, (te+1) — Bj (te) 
k=0 


2 s [Bi (te+1) — Bi (te)] [Bj (ter) — Bj (te)] [Bi (tea1) — Bi(te)] [Bj (tear) — By (te)] 
l<k 


All the increments appearing in the sum of cross terms are independent of one another and have 
mean zero. Therefore, 


var(Cy) = ECh 


= >> [Bi (tk41) — Bi(te)? [Bj (tea) — By (te)]’. 
k=0 


But [B;(tz41) — Bi(t,)]? and [B;(tz41) — B;(tp)]° are independent of one another, and each has 
expectation (¢,41 — t;). It follows that 


n-1 n-1 
var(Crr) = D0 (thea — te)” < [NT] D0 (deta — te) = [IMT 
k=0 k=0 


As ||II|| 0, we have var(Ct)—+0, so Cy converges to the constant IECy = 0. a 


CHAPTER 15. It6’s Formula 175 
15.10 Multi-dimensional Ito formula 


To keep the notation as simple as possible, we write the It6 formula for two processes driven by a 
two-dimensional Brownian motion. The formula generalizes to any number of processes driven by 
a Brownian motion of any number (not necessarily the same number) of dimensions. 


Let X and Y be processes of the form 
X(t) = X(0)+ if aide i BG) Bi Ga if Sat dae). 
Y(t) =Y(0) + i B(u) du+ [ EHO ee i Sao (w) dBo(u). 


Such processes, consisting of a nonrandom initial condition, plus a Riemann integral, plus one or 
more Ité integrals, are called semimartingales. The integrands a(u), 3(u), and 6;;(w) can be any 
adapted processes. The adaptedness of the integrands guarantees that X and Y are also adapted. In 
differential notation, we write 


dX =adt+ O14 dB, i Fi 642 dB, 
dY = B dt + 691 dB, +r 699 dBo. 


Given these two semimartingales X and Y, the quadratic and cross variations are: 


dX dX = (a dt + O14 dB, + 642 dB)’, 
cae a dB, dB, +2611612 dB, dBy A6f, dBy dBy 


dt 0 dt 


= (51 a bf2)? dt, 
dY dY = (8 dt + ba, dB, + 522 dBy)? 
= (53, T 532)” dt, 
dX dY = (a dt + O44 dB, +r b42 dB) (GB dt + bo4 dB, + 6992 dB) 


= (641621 + 612522) dt 


Let f(t, x,y) be a function of three variables, and let X (¢) and Y (t) be semimartingales. Then we 
have the corresponding Ité formula: 


df(t,z,y) = fpdt + fpdX + fydY + $ [few dX dX +2fey dX dY + fy, dY dY]. 
In integral form, with X and Y as decribed earlier and with all the variables filled in, this equation 
is 
fEXO, YO) — FO, X (0), ¥(0)) 
i 
= | [fii t+afe + Bfy 4 $ (57; Cie + (641621 + 512622) fey + $ (83, = O55) fuyl du 


t t 
ri [ life aif Bs | [Safe + Se2f,] dBo, 


where f = f(u, X (uv), Y(w), for i, 7 € {1,2}, d;; = d:;(u), and B; = B;(u). 


176 


Chapter 16 


Markov processes and the Kolmogorov 
equations 


16.1 Stochastic Differential Equations 


Consider the stochastic differential equation: 
dX (t) = a(t, X(t)) dt + a(t, X (t)) dB(t). (SDE) 


Here a(t, x) and o(t, z) are given functions, usually assumed to be continuous in (t, ) and Lips- 
chitz continuous in 2,1.e., there is a constant L such that 


Ja(t, x) — a(t,y)| < Llx — yl, lo(t, 2) — o(t,y)| < Lia — y| 


for all t, x, y. 


Let (to, x) be given. A solution to (SDE) with the initial condition (to, x) is a process {X (t) }+>%5 
satisfying 


X (to) =a, 
X(Q=X(lo) + [ als, X(s)) ds + f ofs,X(s)) dB(s), t > to 


The solution process {X (t) }4>4) will be adapted to the filtration {7 (t) };>0 generated by the Brow- 
nian motion. If you know the path of the Brownian motion up to time ¢, then you can evaluate 
X(t). 
Example 16.1 (Drifted Brownian motion) Let a be a constant and o = 1, so 

dX (t) =a dt + dB(t). 
If (to, 2) is given and we start with the initial condition 


X (to) =a, 


177 


178 


then 
X(t) = «+ a(t — to) + (B(t) — Blto)), t > to. 


To compute the differential w.r.t. t, treat t) and B(ty) as constants: 


dX(t) =a dt + dB(t). 


| 
Example 16.2 (Geometric Brownian motion) Let r and o be constants. Consider 
dX (t) =rX(t) dt+oaX(t) dB(t). 
Given the initial condition 
X (to) = a, 
the solution is 
X(t) = xexp {o(B(t) — Bito)) + (r- 40° )(t _ to) } ; 
Again, to compute the differential w.rt. ¢, treat ty and B(t,) as constants: 
dX (t) = (r — $07) X(t) dt +o X(t) dB(t) + $0° X(t) dt 
=rX(t) dt+oX(t) dB(t). 
| 


16.2 Markov Property 


Let 0 < to < ty be given and let h(y) be a function. Denote by 
IE" h(X (t1)) 


the expectation of h(X (t1)), given that X (to) = x. Now let € € IR be given, and start with initial 
condition 


X(0) =. 


We have the Markov property 
IES x(a) Feo) = IE-*() p(X (t,)). 


In other words, if you observe the path of the driving Brownian motion from time 0 to time to, and 
based on this information, you want to estimate h(X (t,)), the only relevant information is the value 
of X (to). You imagine starting the (SDE) at time to at value X (to), and compute the expected 
value of h(X (t,)). 


CHAPTER 16. Markov processes and the Kolmogorov equations 179 
16.3 Transition density 


Denote by 
P(to,f13 ty) 


the density (in the y variable) of X (t,), conditioned on X (to) = x. In other words, 
Eh(X(th)) = ie h(y)p(to, ti; @,y) dy. 
The Markov property says that for 0 < to < ¢, and for every &, 
es x(a) Ft)| =f h(ypplto. tr: X (to), 9) ay. 


Example 16.3 (Drifted Brownian motion) Consider the SDE 
dX (t) = adt+dB(t). 


Conditioned on X (to) = 2x, the random variable X (¢;) is normal with mean x + a(t, — to) and variance 


(ty = to), 1.€., 
1 — t; —to)))? 
Hints: 2.9) = ob ap {Watt 
Qn (ty — to) 2(t1 — to) 
Note that p depends on tg and t, only through their difference ¢; — to. This is always the case when a(t, x) 
and o(¢, 2) don’t depend on t. | 


Example 16.4 (Geometric Brownian motion) Recall that the solution to the SDE 
dX (t) =rX(t) dt+oaX(t) dB(t), 
with initial condition X (¢o) = x, is Geometric Brownian motion: 
X(t1) = wexp {o(B(t1) — Bito))+(r- 407) (ty = to) } ; 
The random variable B(t,) — B(to) has density 
PUG) =< se oie {-*_| db, 
Qn(ti — to) 2(t, — to) 
and we are making the change of variable 
y = xexp {ob + (r — $07)(ty — to) } 


or equivalently, 


The derivative is 


180 


Therefore, 
p(to,ti;v,y) dy = P{X(ti) € dy} 
: 1 y 12 } 
= SS exp § — log = - (rv — 50) (t, - ae 
oy/Im(ty — to) P| Waist ee | a 


Using the transition density and a fair amount of calculus, one can compute the expected payoff from a 
European call: 


BE" (X(T) —K)t = [w- K)*p(t,T; 2, y) dy 
= @"F-Y aN or flog = +r(T—t)+40°(T- »)) 
_KN coe [log — Ps pSte (Gs )) 


where 


Therefore, 


Ee |e" -) (X(T) — K)t 


és l X(t) 12 
= X(i)N (- mS f ee +r(T—t)+ 50°(T -t) 
ap K N 2 Al) +r(f—-t)-40°(T -1) 
oVvl—-t K 
| 
16.4 The Kolmogorov Backward Equation 
Consider 
dX (t) = a(t, X (t)) dt + a(t, X (t)) dB(t), 
and let p(to, t1; v7, y) be the transition density. Then the Kolmogorov Backward Equation is: 
0 0 o? 
are (to, t1; 2, y) = a(to, #) a —p(to, t; ee a 27 (to, ®) 5 —Plto, 1; z,y). 
e (KBE) 


The variables tg and x in (IX BE) are called the backward variables. 


In the case that a and o are functions of x alone, p(to, #1; x, y) depends on tg and t, only through 
their difference 7 = t; — to. We then write p(r; x,y) rather than p(to,t1; v,y), and (K BE) 
becomes 


bole 


g : — dO ‘ 2 F ’ 
9B, Plt a, y) 7? a(x) a p(T; t,y) + a (x) P(t @,Y). (KBE’) 


CHAPTER 16. Markov processes and the Kolmogorov equations 181 


Example 16.5 (Drifted Brownian motion) 


dX(t) =a dt + dB(t) 


1 
LE ty) = Sore 


0 QO 1 (y—2-—ar)? 
—p=p,=(— ex Se 
ar? f Or /2nr , 27 


QT 27? 
0 toe aar 
anh fe T 
be _ Oy -@—ar re eae a8 
Qe ae Ox T T * 
2 
—u— aT 
Liga = ee 
Therefore, 
1 a a(y—a—ar) 1 (y—2-—ar)? 
ap, + 7Prx = | EB 7 +: 972 
= Pr. 
This is the Kolmogorov backward equation. | 


Example 16.6 (Geometric Brownian motion) 


dX(t) =rX(t) dt + oX(t) dB(t). 


(7; #9) =— [log % — ( - $02)r] 
eee 5753 0g = r— 50° )T ; 


It is true but very tedious to verify that p satisfies the KBE 


Te Bie? 
Pr = PEP, + 50°" Pee. 


16.5 Connection between stochastic calculus and KBE 


Consider 
dX (t) = a(X(t)) dt+o(X(t)) dB(t). (5.1) 
Let h(y) be a function, and define 


v(t, ©) = E*h(X(T)), 


182 
where 0 < ¢ < T. Then 
v(t, #) = few p(T —t; x,y) dy, 
v(t, «) = - f rly) p(T —t; x,y) dy, 
Ue (t,@) = few po(T —t; x,y) dy, 
Veli y= few Peo(T —t; x,y) dy. 
Therefore, the Kolmogorov backward equation implies 


u(t, ©) + a(a)ue(t, 2) + $07 (2) Une(t, y= 


few =p. (7 —t2,y) + a()pe(T — t32,y) + 90°(¢) Pra (T — th 2, y)| dy = 0 


Let (0, €) be an initial condition for the SDE (5.1). We simplify notation by writing J rather than 
EE, 


Theorem 5.50 Starting at X (0) = €, the process v(t, X (t)) satisfies the martingale property: 
B [ot x) Fo) = v(s, X(s)), O0<s<t<T. 
Proof: According to the Markov property, 
B [ax (ry) FO] = XOX) = vt, X), 


Nie) 


It6’s formula implies 


du(t, X(t)) = vdt + ved X + $0n0dX dX 
= v,dt + av,dt + ov,dB + $0 Vpndt. 


CHAPTER 16. Markov processes and the Kolmogorov equations 183 
In integral form, we have 
v(t, X()) = v(0, X (0)) 


ce [ [ve(u, X(u)) + a(X (u))v(u, X (u)) + 4o?(X(u))vro(u, X(u))| aa 


i. [o(xtapvetu, X00) dB(u). 


We know that u(t, X (t)) is a martingale, so the integral {> le + av, + $07 vp du must be zero 
for all ¢. This implies that the integrand is zero; hence 


12 
Ue + AUe + 70° Vex = 0. 


Thus by two different arguments, one based on the Kolmogorov backward equation, and the other 
based on It6’s formula, we have come to the same conclusion. 


Theorem 5.51 (Feynman-Kac) Define 


v(t,27) = EXA(X(T)), 0<t<T, 


where 
dX (t) = a(X(t)) dt + o(X(t)) dB(t) 
Then 
v;(t, 2) + a(z)vz(t, v) + 407(x) vee(t, 2) = 0 (FK) 
and 
OTe) = Re). 


The Black-Scholes equation is a special case of this theorem, as we show in the next section. 


Remark 16.1 (Derivation of KBE) We plunked down the Kolmogorov backward equation with- 
out any justification. In fact, one can use It6’s formula to prove the Feynman-Kac Theorem, and use 
the Feynman-Kac Theorem to derive the Kolmogorov backward equation. 


16.6 Black-Scholes 
Consider the SDE 
dS(t) = rS(t) dt + 0S(t) dB(t). 


With initial condition 
SiH—we, 


the solution is 


S(u) = vexp {o(B(u) — BQ) + (r—4o*)(u-h, wet 


184 


Define 
u(t, v2) = IE“*h(S(T)) 
= Eh (« exp {o(B(T) S BO) ee age (r= ‘}) : 


where / is a function to be specified later. 


Recall the Independence Lemma: If G is a o-field, X is G-measurable, and Y is independent of G, 
then 


where 


y(@) = Eh(a,Y). 


With geometric Brownian motion, for 0 < ¢ < 7’, we have 


= S(t) exp{o(B(T) - Bi) + (r- $0°)(T - 0} 


eed 
F (t)-measurable independent of F(t) 
We thus have 
SIT)= AY, 

where 

A= SE) 

¥ =exp {o(B(T) — Bt) + (r - 40°) (Th. 
Now 


IER (GY ) = v(t, 2): 
The independence lemma implies 


B [nscry reo] = BHYIFO) 


= v(t, X) 
v(t, SW) ) 


CHAPTER 16. Markov processes and the Kolmogorov equations 185 


We have shown that 
v(t, §()) = E sry reo| , O<tK<T. 


Note that the random variable h(5(7T)) whose conditional expectation is being computed does not 
depend on t. Because of this, the tower property implies that v(t, S(f)),0 < ¢ < T, is a martingale: 
For0Q<s<t<T, 


E re. sw) = E usr reo] | 
-E sry lr) 
= os s5(s)). 


This is a special case of Theorem 5.51. 


Because v(t, .S(£)) is a martingale, the sum of the dt terms in du(t, S(t)) must be 0. By Ité’s 
formula, 


dv(t, S(t) = [vi(t, S()) dt + rS(tvo(t, $() + $07S?(O)vr0(t, S()| at 
+oS(t)v,(t, S(t)) dB(t). 
This leads us to the equation 
v;(t, ©) + rave(t, ©) + 40°27 vee (t, 2) = 0, O<t< 7, #2 0. 
This is a special case of Theorem 5.51 (Feynman-Kac). 
Along with the above partial differential equation, we have the terminal condition 
o(T, 2) = h(x), «> 0. 


Furthermore, if S(t) = 0 for somet € [0,7], then also S(7’) = 0. This gives us the boundary 
condition 
v(t, 0) = h(0), O<t<T. 


Finally, we shall eventually see that the value at time ¢ of a contingent claim paying h(.S(T)) is 
u(t,v) =e" T-9 Et h(S(T)) 


=e "TD y(t, 2) 


at time t if S(t) = x. Therefore, 


v(t, e) =e T-Yu(t, 2), 
v:(t, 2) = —re’?-Du(t, 2) + Fault, 2), 
vz(t, 2) =e’ FY, (t, 2), 
Vee (t, 2) = TO uc a(t, 2) 


186 


Plugging these formulas into the partial differential equation for v and cancelling the er (Tt) ap- 
pearing in every term, we obtain the Black-Scholes partial differential equation: 
—ru(t,2) + u(t, v) + reuc(t, ©) + $0727 ure(t, x) = 0, 052 <7, 22 0: 
(BS) 
Compare this with the earlier derivation of the Black-Scholes PDE in Section 15.6. 
In terms of the transition density 
(7; 2,9) = 1 og — r-4or 9] 
23, y) = ———— exp 4 - — FF Jlog — - (r -— 50 - 
. e oy/2n(T — t) 2(T — t)a? ee : 
for geometric Brownian motion (See Example 16.4), we have the “stochastic representation” 
u(t, vc) =e"7-9 Et h(S(T)) (SR) 


See) Ps h(y)p(t, 7; x,y) dy. 
ie) 


In the case of a call, 
hy) = (y— K)* 

and 

ick v( : [ Y nT 1) 440(T 0) 

Vets ——— |log—4 r(T - s0°(T — 
og \ : 

1 x 
—r(T-t) x 12 
Even if h(y) is some other function (e.g., h(y) = (K — y)*, a put), u(t, x) is still given by and 
satisfies the Black-Scholes PDE (BS) derived above. 


16.7 Black-Scholes with price-dependent volatility 


dS(t) = rS(t) dt + B(S(t)) dB(t), 
v(t, 2) = e7(F- et" (S(T) — K)*. 


The Feynman-Kac Theorem now implies that 
—rov(t,v) + w(t, 2) + rev, (t,2) + 48? (2)vzc(t, 2) = 0, OS¢< 75 25-0, 
v also satisfies the terminal condition 


v(T,2) = (x-K)*, «> 0, 


CHAPTER 16. Markov processes and the Kolmogorov equations 187 


and the boundary condition 
v(t, 0) = 0, 0<t<T. 


An example of such a process is the following from J.C. Cox, Notes on options pricing I: Constant 
elasticity of variance diffusions, Working Paper, Stanford University, 1975: 


dS(t) = rS(t) dt +oS°(t) dB(t), 
where 0 < 6 < 1. The “volatility” oS aes) decreases with increasing stock price. The corre- 
sponding Black-Scholes equation is 


ru+tu,trev; 4 £g77%y,, = 0, 0<t<T «>0; 


v(t,0)=0, O<t<T 
OT 2) Seek, z>0. 


188 


Chapter 17 


Girsanov’s theorem and the risk-neutral 
measure 


(Please see Oksendal, 4th ed., pp 145-151.) 


Theorem 0.52 (Girsanov, One-dimensional) Let B(t),0 < t < T, be a Brownian motion on 
a probability space (Q,F,P). Let F(t),0 < t < T, be the accompanying filtration, and let 
6(t),0 <t < T, bea process adapted to this filtration. For 0 < t < T, define 


Bi) = i 6(u) du + B(t), 
ne {- [aw dB(u) — ah 6? (u) au} 
and define a new probability measure by 
P(A) = i Z(T) dP, WAEF. 
Under IP, the process B(t),0 < t < T, is a Brownian motion. 


Caveat: This theorem requires a technical condition on the size of @. If 
LT 
Eexp<$ i 07 (u) dur < x, 
0 


We make the following remarks: 


everything is OK. 


Z(t) is a matingale. In fact, 


189 


190 


IP is a probability measure. Since 7(0) = 1, we have I Z(t) = 1 for every t > 0. In particular 


P(Q) = [2a dIP = IE-Z(T) = 1, 


so P isa probability measure. 

IE in terms of 2. Let I’ denote expectation under IP. If X isa random variable, then 
IEZ = IE[Z(1)X). 
To see this, consider first the case Y = 14, where A € ¥. We have 
IEX = P(A) = [ 2a) dIP = [ 2a). dIP = IE(Z(T)X]. 

Now use Williams’ “standard machine”. 

P and IP. The intuition behind the formula 
P(A) = |, Z(T) dP VAC 
is that we want to have _ 
P(w) = 2(T,w) P(w), 


but since IP(w) = 0 and IP(w) = 0, this doesn’t really tell us anything useful about P. Thus, 
we consider subsets of 2, rather than individual elements of 2. 


Distribution of B(T). If 6 is constant, then 
A(T) = exp {-6B(T) — 40°7} 
B(T) = 6T + B(T). 


Under IP, B(T) is normal with mean 0 and variance 7, so B(T) is normal with mean 6T and 
variance 7’: 


P(B(T) € db) = seeps | db. 


Removal of Drift from B(7’). The change of measure from JP to IP removes the drift from B(T). 
To see this, we compute 


EB(T) = B[Z(T)(@T + B(Z))] 
= E [exp {-0B(T) - 407} (eT + B(T))| 
é =F fr +b) exp{—6b — 407} xp {ar db 
= =F i T +) xpi—© a db 
(y= 6T +6) = = - y exp {4 dy (Substitute y = 67 + b) 


CHAPTER 17. Girsanov’s theorem and the risk-neutral measure 191 


We can also see that IZ B (T’) = 0 by arguing directly from the density formula 


IP { B(t) € db} = : xp{ db. 


V2nT 27 
Because 
Z(T) = exp{—-OB(T) — 40°T} 
= exp{—0(B(T) — 6T) — 40°T} 
= exp{—0B(T) + 40°T}, 
we have 


P {B(L) € db} = P{B(T) € db} exp {-0b + 407} 


1 C200) 28. aie ‘ 
= exp4—-- = _ 9h 4 16° S db. 
V2nT 27 Z 
1 


Under IP, B(T) is normal with mean zero and variance T. Under IP, B(T) is normal with 
mean @T and variance T’. 


Means change, variances don’t. When we use the Girsanov Theorem to change the probability 
measure, means change but variances do not. Martingales may be destroyed or created. 
Volatilities, quadratic variations and cross variations are unaffected. Check: 


dB dB = (0(t) dt + dB(t))? = dB.dB = dt. 
17.1 Conditional expectations under (P 


Lemma 1.53 Let 0 < t < T. If X is ¥(t)-measurable, then 


IEX = E[X.Z(t)]. 


Proof: 


because Z(t),0 <t < 7, is a martingale under P. a 


192 


Lemma 1.54 (Baye’s Rule) /f X is ¥(t)-measurable andQ < s <t < T, then 


1 


E[X|F(s)] = 70) 


E[XZ(t)|F(s)]. (1.1) 


Proof: It is clear that ZwE [X Z(t)|F(s)] is F(s)-measurable. We check the partial averaging 
property. For A € F(s), we have 

1 
Ze 
= E (LAE |X Z(t)|F(s)]] (Lemma 1.53) 
= ELE[lsX Z()|F(s)]] (Taking in what is known) 
= E[lsXZ(t)] 


= E[14X] (Lemma 1.53 again) 


= [ x dP. 
A 


| RPO OP = E fa ex 20700) 
4 Z(8) 


Although we have proved Lemmas 1.53 and 1.54, we have not proved Girsanov’s Theorem. We 
will not prove it completely, but here is the beginning of the proof. 


Lemma 1.55 Using the notation of Girsanov’s Theorem, we have the martingale property 


E[B()|F(s)])= B(s), O<s<t<T. 


Proof: We first check that B(t) Z(t) is a martingale under PP. Recall 
dB(t) = 0(t) dt + dB(t), 
dZ(t) = —O(t)Z(t) dB(t). 
Therefore, 
d(BZ) = BdZ+ZdB+dBdZ 
= —B0Z dB + Z0dt+ Z dB—-0Z dt 
= (—BOZ + Z) dB. 


Next we use Bayes’ Rule. For0Q << s<t< T, 


E(B (t)|F(s)] = Fo BOZO IF 
1 a 


B(s). 


CHAPTER 17. Girsanov’s theorem and the risk-neutral measure 193 


Definition 17.1 (Equivalent measures) Two measures on the same probability space which have 
the same measure-zero sets are said to be equivalent. 


The probability measures /P and IP of the Girsanov Theorem are equivalent. Recall that P is 
defined by 


P(A) = fan dP ACF. 


If P(A) = 0, then [, 7(7) dJP = 0. Because Z(7') > 0 for every w, we can invert the definition 
of JP to obtain 


Pa) = | aay nee 


If P(A) = 0, then fy zt7y dP = 0. 


17.2. Risk-neutral measure 


As usual we are given the Brownian motion: B(t),0 < ¢ < T, with filtration F(t),0 < t < T, 
defined on a probability space (2, 7, P). We can then define the following. 


Stock price: 
dS(t) = u(t) S(t) dt + o(t)S(t) dB(t). 


The processes y(t) and o(t) are adapted to the filtration. The stock price model is completely 
general, subject only to the condition that the paths of the process are continuous. 


Interest rate: r(t),0 < t < T. The process r(t) is adapted. 


Wealth of an agent, starting with ¥(0) = x. We can write the wealth process differential in 
several ways: 


dX(t)= A(t) dS(t)  +r(O[X(t)- AW)S(O] dt 
— a 
Capital gains from Stock Interest earnings 


= r(t)X(t) dt + A(®[dS(t) — r8(t) di] 
= r(t)X(t) dt + A(t) (u(t) — r(t)) S(t) dt + A(@)o(#) S(t) dB(t) 


Risk premium 


= r(t)X (t) dt + A(t)o(t) S(t) mae dt + dB(t) 


Market price of risk=6(t) 


194 
Discounted processes: 
d G fo ru) **3(0) =e Sr) pays) dt + dS(2)] 


d (< for) *“x() =e So" [p(X (t) dt + dX(0)] 


Notation: 
= edo) du 1 ta aes fi r(w) du 
ne BO : 
ie et ee r(t) 


B(t) ne 
= Fey (HO —PO)SO & + SH aBO) 
= FHS [o(t) dt + dB(t)], 
d Ga = A(t) d (sa) 
= an o(t)5(t) (0(t) dt + dB) 


Then 
S(@)\ 1 ‘ 
(sa) = FyTsto) AB), 
X(i)) _ AW), 
(Say) = Fy 790 BO 


Under IP, “4 and —. are martingales. 


Definition 17.2 (Risk-neutral measure) A risk-neutral measure (sometimes called a martingale 
measure) is any probability measure, equivalent to the market measure JP, which makes all dis- 
counted asset prices martingales. 


CHAPTER 17. Girsanov’s theorem and the risk-neutral measure 195 


For the market model considered here, 
P(A) = He Z(T) dP, AEF, 
A 


where 


Z(t) =exp{- [aw PCs ao dul 


is the unique risk-neutral measure. Note that because 6(f) = ee we must assume that a(t) # 
0. 

Risk-neutral valuation. Consider a contingent claim paying an F (7’)-measurable random variable 
V at time 7. 


Example 17.1 


V=(S(T)—K)*, European call 
Va(k= S(T)", European put 


iO ae i 
V= (F/ S(u) du- «) ; Asian call 
LT’ Jo 


V = max S(t), Look back 
O<t<T 


If there is a hedging portfolio, ie., a process A(t), 0 < t < 7’, whose corresponding wealth process 
satisfies X (7’) = V, then 


This is because a4 is a martingale under IP, so 


B 
x0 =F) = * [Fey] = * Lal 


196 


Chapter 18 


Martingale Representation Theorem 


18.1 Martingale Representation Theorem 


See Oksendal, 4th ed., Theorem 4.11, p.50. 


Theorem 1.56 Let B(t),0 < t < T, be a Brownian motion on (Q, F, P). Let F(t),0 <t < T, be 
the filtration generated by this Brownian motion. Let X (t),0 < t < 7, be a martingale (under IP) 
relative to this filtration. Then there is an adapted process 5(t),0 <t < T, such that 


t 
X(t) =x(o+ | §(u) dB(w), O0<t<T. 
0 
In particular, the paths of X are continuous. 


Remark 18.1 We already know that if X (¢) is a process satisfying 
dX (t) = d(t) dB(t), 


then X (f) is a martingale. Now we see that if X (f) is a martingale adapted to the filtration generated 
by the Brownian motion B(f), i.e, the Brownian motion is the only source of randomness in_X, then 


dX (t) = 6(t) dB(t) 


for some 5(¢). 


18.2 A hedging application 


Homework Problem 4.5. In the context of Girsanov’s Theorem, suppse that F(t),0 < t < T, is 
the filtration generated by the Brownian motion B (under JP). Suppose that Y is a /P-martingale. 
Then there is an adapted process y(t), 0 < t < 7, such that 
t ee 
Y(t) =¥(0) + | y(u) dB(u), 0<t<T. 
0 


197 


198 


dS(t) = p(t)S(t) dt + o(t)S(t) dB(t), 


Z(t) = exp {- [ou dB(u) [ew dul 


P(A) = [| 20) dP, AEF. 


Then 


S(t)\ _ S(t) . 
d (Ga) = Fal) dB). 


Let A(t),0 < t < T, be a portfolio process. The corresponding wealth process X (t) satisfies 


AD. S(t) x 
d Ga 5 ) = Alot) Fy aBO, 
X(t) _ to a) oh 


Let V be an ¥(7’)-measurable random variable, representing the payoff of a contingent claim at 
time 7’. We want to choose X (0) and A(t),0 < t < T, so that 


X(T) =V. 


Define the IP-martingale 


~f V 
y= [Tro], 0<t<T. 
@=E | a lFO 
According to Homework Problem 4.5, there is an adapted process y(t), 0<t < T, such that 


Y(t) =¥(0)+ fay dB(u), O0<t<T. 


Set X(0) =Y(0)=E lain | and choose A(w) so that 


CHAPTER 18. Martingale Representation Theorem 199 


With this choice of A(u),0 < u < 7, we have 


X(t) | ee ae 

FH ~Y0- Ely ro] » Deer. 
In particular, 

X(T) _ #¢[¥_ tie tes 

ay = Laem = way 

X(T) =V. 


The Martingale Representation Theorem guarantees the existence of a hedging portfolio, although 
it does not tell us how to compute it. It also justifies the risk-neutral pricing formula 


X() = 80B za F0) 


= 8s. (40) 
~ ZO bo vc] 
1 
= aye cen ro] . O<e<T, 
where 
Z(t) 
= 30 


18.3 d-dimensional Girsanov Theorem 


Theorem 3.57 (d-dimensional Girsanov) e Bt) = (Bi(t))..24 Bal), 0 < t < Ty a d- 
dimensional Brownian motion on (Q, F, P); 


e F(t),0 <t< T, the accompanying filtration, perhaps larger than the one generated by B; 
e A(t) = (A,(t),... , Oa(t)),0 < t < T, d-dimensional adapted process. 
For0 <t < T, define 


2 t 
Bi) = | SB. FS Teal 


240) = exp {= [6(u). aBtw) — 4 file? du, 


P(A) = i Z(T) dh. 


200 


Then, under P, the process 
BOAO pie ti). Oats 7 


is a d-dimensional Brownian motion. 


18.4 d-dimensional Martingale Representation Theorem 


Theorem 4.58 e Bit) = (Bi (t),..., Ba(t)),0 < t < T, a d-dimensional Brownian motion 
on (Q, F, P); 
e F(t),0<t<T, the filtration generated by the Brownian motion B. 


If X(t),0 < t < T, is a martingale (under IP) relative to F(t),0 < t < T, then there is a 
d-dimensional adpated process 5(t) = (61(t),... , da(t)), such that 


t 
0) + | §(u).dB(u), 0<t<T. 
0 
Corollary 4.59 [fwe have a d-dimensional adapted process 0(t) = (1 (t),-.. , @a(t)), then we can 
define B, Z and IP as in Girsanov’s Theorem. If Y (t),0 < t < T, is a martingale under IP relative 


to F(t),0 < t < T, then there is a d-dimensional adpated process y(t) = (y1(t),---,Ya(t)) such 
that 


0) + fv. aBw, 0<t<T. 


18.5 Miulti-dimensional market model 


Let B(t) = (Bi(t),..., Ba(t)), 0 < t < T, be a d-dimensional Brownian motion on some 
(Q,F,P), and let F(t), 0 < t < T, be the filtration generated by B. Then we can define the 
following: 


Stocks 
dS; (t) = pi (t)S;(t) dt + Sy(t Levi CSW 


Accumulation factor 


G(t) = exp {fre dub 


Here, j1;(t), o;;(¢) and r(t) are adpated processes. 


CHAPTER 18. Martingale Representation Theorem 201 


Discounted stock prices 


. ; , : 
d ea = (p;(t) — r(t)) Silt) dt + ot) doi) dB;(t) 


50) ) CO no ao 
Risk Premium 
ao) 5 ox(2) (0; (t) + dB;(t)] (5.1) 
~ B(t) = . — 
dB; (t) 
For 5.1 to be satisfied, we need to choose 6; (t),... , g(t), so that 
d 
dH OGO =hOa=9O, 9 = Vyoceyi. (MPR) 
j=l 


Market price of risk. The market price of risk is an adapted process 6(f) = (@1(t),... , @a(t)) 
satisfying the system of equations (MPR) above. There are three cases to consider: 


Case I: (Unique Solution). For Lebesgue-almost every ¢ and /P-almost every w, (MPR) has a 
unique solution @(t). Using @(t) in the d-dimensional Girsanov Theorem, we define a unique 
risk-neutral probability measure IP. Under P, every discounted stock price is a martingale. 
Consequently, the discounted wealth process corresponding to any portfolio process is a P- 
martingale, and this implies that the market admits no arbitrage. Finally, the Martingale 
Representation Theorem can be used to show that every contingent claim can be hedged; the 
market is said to be complete. 


Case II: (No solution.) If (MPR) has no solution, then there is no risk-neutral probability measure 
and the market admits arbitrage. 


Case III: (Multiple solutions). If (MPR) has multiple solutions, then there are multiple risk-neutral 
probability measures. The market admits no arbitrage, but there are contingent claims which 
cannot be hedged; the market is said to be incomplete. 


Theorem 5.60 (Fundamental Theorem of Asset Pricing) Part I. (Harrison and Pliska, Martin- 
gales and Stochastic integrals in the theory of continuous trading, Stochastic Proc. and Applications 
11 (1981), pp 215-260.): 

If a market has a risk-neutral probability measure, then it admits no arbitrage. 


Part II. (Harrison and Pliska, A stochastic calculus model of continuous trading: complete markets, 
Stochastic Proc. and Applications 15 (1983), pp 313-316): 
The risk-neutral measure is unique if and only if every contingent claim can be hedged. 


202 


Chapter 19 


A two-dimensional market model 


Let B(t) = (B,(t), Bo(t)),0 < t < T, be a two-dimensional Brownian motion on (Q, F, P). Let 
F(t),0 <t < 7, be the filtration generated by B. 


In what follows, all processes can depend on ¢ and w, but are adapted to F(t),0 < t < T. To 
simplify notation, we omit the arguments whenever there is no ambiguity. 


Stocks: 
dS, = S,[p, dt+ 0, dB], 


dS 3 = Sg 2 di + po dB, + 1 — p? 02 aBa) 


We assume 0; > 0, o2 > 0, —1 < p < 1. Note that 
dSy dS = Sia; dB, dB, = oS? dt, 
dS dS = Se pas, dB, dB, + St a p”)o3 dBy dBy 
= 655. di, 
dSy dS = S101 S2p02 dB, dB, = po 0251S dt. 


In other words, 


° ou has instantaneous variance o7, 
e oe has instantaneous variance 73, 


e L and = have instantaneous covariance po 02. 


a) =exp{ fr dul. 


The market price of risk equations are 


Accumulation factor: 


a0, = $4 —7 
(MPR) 
por, + \/1— p?a202 = 2-9 


203 


204 


The solution to these equations is 


(po. 

oj 
gins Cit) = poi *) 
: 0102/1 — p? : 


provided -l<p<l. 
Suppose —1 < p < 1. Then (MPR) has a unique solution (6), 62); we define 


i i i 
n= exp{- [ 0, dB, -f Opt Bis= if (62 + 62) aul 
0 0 0 
P(A) = i Z(T) dP, WAGE F. 
A 
P is the unique risk-neutral measure. Define 
_ i 
Bays | 6, du + By(t), 
0 


Balt) = fo. du + By(t). 


Then 


dS; = 5, [r dt +o4 dB, | 


dS = So f dt + po2 dB, + 1- pardBs| 7 


We have changed the mean rates of return of the stock prices, but not the variances and covariances. 


19.1 Hedging when —1 < p< 1 


dX = Ay dSy + Ao dS r(x AyS4 A252) dt 
d (=) eee a) 


py 6 
= sands > r Sy dt) + 5Aa(dss = rSo dt) 


1 ~ 1 ae ty 
= goivig dB, + ie po. dB, + 1- p?o2 ab, A 


Let V be F(T)-measurable. Define the P-martingale 


CHAPTER 19. A two-dimensional market model 205 


The Martingale Representation Corollary implies 


t - t a 
¥(Q=¥(0)+ f ndBit f 12 dba. 
0 0 
We have 


xX 1 1 ~ 
d (3) = (Gaisin if 5 A2Sap2) dB, 


1 7 
+ Bo2s2y 1 — p*ay dBo, 


dY = V1 dB, + v2 dBo. 


We solve the equations 


1 1 
gota + B 


1 
Bars2y 1S p02 = 72 


for the hedging portfolio (A;, Az). With this choice of (A;, Az) and setting 


A2S2po2 = V1 


a V 
X()=Y0 =Ea 


we have X(t) = Y(t), 0 <¢ < 7, and in particular, 
X(T) =V. 


Every ¥ (7’)-measurable random variable can be hedged; the market is complete. 


19.2 Hedging when p = 1 


The case p = —1 is analogous. Assume that p = 1. Then 


dS = Sy [py dt + O71 dB, | 
dS = S'9[ M2 dt + 02 dB, | 


The stocks are perfectly correlated. 


The market price of risk equations are 


oO) = ju — 7 (MPR) 
7201 = fig — 7 


The process @2 is free. There are two cases: 


206 


Casel: “—* #4 “2. There is no solution to (MPR), and consequently, there is no risk-neutral 
measure. This market admits arbitrage. Indeed 


X 1 1 
d (5) = Be = rSy dt) + rime a > rSo dt) 
1 1 
= posites = r) dt + O71 dBy| + go2sel(He = r) dt + 092 dBy| 
Suppose “— > “Z—. Set 
1 1 
A, =—~, A,=- : 
: 1547 0292 
Then 
xX 1lfui-r 1 fpoa-r 
(*) 2 [ta an) 2 [tas an 
B/ BL oy Ap ple a ; 
-5 (8-8 a 
pn at 
Positive 


Case II: 4" = “4—. The market price of risk equations 


010, =f -7 
020, = 2-7 
have the solution 


fir 2-1 
4, = —_— = —— 
O71 02 


’ 


2 is free; there are infinitely many risk-neutral measures. Let IP be one of them. 
Hedging: 
X 1 1 
d (=) = galls —r) dt+o, dBi] + ghesaltue —r) dt +o dB,] 
1 1 
= gore dt + dB, | + eee dt + dB, | 


1 1 ~ 
= (Gasie1 + 5 A25202) dBy. 


Notice that By does not appear. 


Let V be an ¥(7')-measurable random variable. If V depends on Bz, then it can probably not 
be hedged. For example, if 
V =A(Si(T), 52(T)), 


and o, or 72 depend on Bg, then there is trouble. 


CHAPTER 19. A two-dimensional market model 


More precisely, we define the IP-martingale 


Y()=E san 


We can write 


t — t ee 
¥(Q=¥(0)+ | ndbit [12 4bs, 
0 0 
Ne) 


dY = V1 dB, + v2 dBo. 
To get d (4) to match dY’, we must have 


v2 = 0. 


Fo), 0<t<T. 


207 


208 


Chapter 20 


Pricing Exotic Options 


20.1 Reflection principle for Brownian motion 


Without drift. 
Define 


Then we have: 


IP{M(T) > m, B(T) < b} 


So the joint density is 


2 1 ioe) 
IP{M(L) € dm, B(T) € db}= a i os veo a} ir) dm db 


0 1 ie 
Sa dm db 
in (amen {ef am at 


— —b 
ae?) ) exp “Gm-o dmdb, m>0,b6<m. 
TV2nT 2T 


With drift. Let 
B(t) = 6t + Bit), 


209 


210 


2m-b , shadow path 


Brownian motion 


Figure 20.1: Reflection Principle for Brownian motion without drift 


m=b 


(B(T), M(T)) lies in here 


Figure 20.2: Possible values of B(T), M(L). 


CHAPTER 20. Pricing Exotic Options 211 


where B(t), 0 < t < T, is a Brownian motion (without drift) on (Q, ¥, P). Define 


Z(T) = exp{-0B(T) — 40°T} 
= exp{—0(B(T) + 6T) + 40°T} 
= exp{—0B(t) + 4@°T}, 

P(A) = [2@ dP, YAEF. 


SetM(T) = maxocicr B(T). 


Under P, B is a Brownian motion (without drift), so 


a — an DOni= 0) Qm—b)? |) 4 « - 
IP{M(T) € di, B(T) € db} = ~— Soe dnd: > 0, b< m. 
{M(T) Edm, B(T) € db} T iat pf 7 m m m 


Since h is arbitrary, we conclude that 


(MPR) 


IP{M(T) € din, B(T) € db} 
= exp{6b — 46°T} IP{M(T) € din, B(T) € db} 


2(27 — 6) (27a — b)? ae ee ae pt 
= —— exp ¢ —- > _ } .exp{0) —5¢°Thdindb, m>0,b<m. 
Tank P| OT pt 2 } 


212 
20.2 Up and out European call. 


Let 0 < K < L be given. The payoff at time 7’ is 
(S(T) — K)* lyse ry <x}; 


where 
Sh) = ee 


To simplify notation, assume that JP is already the risk-neutral measure, so the value at time zero of 
the option is 


v(0, (0) =e" B | (S(L) — K)*1gscry<x3] - 
Because /P is the risk-neutral measure, 


dS(t) = rS(t) dt+ oS(t) dB(t) 
S(t) = Soexp{oB(t) + (r — $07)t} 
= Soexp <a | B(t) + (< - =) t 
= So exp{oB(t)}, 


= (5-8). 
o 2 


B(t) = 6t + B(t). 


where 


Consequently, 
S*(t) = Soexp{oM(t)}, 


where, 


We compute, 


0(0,.5(0)) =e" B |(S(T) — K) 1p sery cry 


—rT a 
ee (s10 )exp{oB(T)} — K) "1 seexptositr)} < n| 
= “TB (5(0) ) exp{o B(T)} — K) Le 1 Ko os 1 re | 
{Bays log S(0)' MS log S(0) \ 
—_—_—_—___” —_—_—_—_—” 


b m 


CHAPTER 20. Pricing Exotic Options 213 


X 
B(T) 


Figure 20.3: Possible values of B(T), M(T). 


We consider only the case 

S(0)<K<L, so 0<b<m. 
The other case, AK < (0) < L leads to 6 < 0 < 7 and the analysis is similar. 
We compute je [i ...dy de: 


xv 


v(0,5(0)) =e"? [ [vo exp{ax}— Kee exp a + Ox — er} dy dx 


Seer f(s me (2y= 2)" |p. pep er 
re : (S(0) exp{ax} — apo eer ee x — 4 z 
y=er 
2 
= [*(s(0)exn(ae) — K) elon | Foe — er 
_ y2 

exp PR +00 — yer} dx 
Be SG is “ 6c —19°7 S a 
Re ey exp aT an EF x 


1 —rT ai a? 192 
— ———=e "K exp < —-—— + 6x — s0°T > dz 
QnT b P| 2T : 


Loe - (27 — #)? 192 

= e '* S(0 exp < ox — ~————*~ + 62 — 50°T > da 
nT (0) f P| oT 2 
Lrg [Pf Qn 2) - 

tee TK | exp « —~————— +. 6x — <6°T > dz. 
QnT b rf 2T a 


The standard method for all these integrals is to complete the square in the exponent and then 
recognize a cumulative normal distribution. We carry out the details for the first integral and just 


214 


give the result for the other three. The exponent in the first integrand is 


1 

= apt oT 6T)? | s0°T L o AT 
1 rT oT \? 

eee ans Se i 
or ( o a) avr 


In the first integral we make the change of variable 


y= (*—rT/o— oT /2)/VT, dy = dz/VT, 


to obtain 


—rT ™m 2 
ete exp {or a7 +00 — er dz 
b 


V20T 2T 
1 m 1 rl oT? 
= TFs f xp {-gp (2-2-2) 7 
mn _rVT_ OVT 
VF oO. 2 
= se. ff exo S hay 
ae 6 orVT_ OVE é 
Vr Oo! 2 
etal ee Yc gif ct 
VT a 2 VT a 2 


vasion=st0 |e (Fy -et) -» (ye) 
ee eee) 8 Ge ee) 
ON sta ne 


+exp{-rP 42m (~~ 2} Lv (eee -T) = 


WE o 2 
where 
Prince hin. eich Mane 
“oP SOl “6 SO): 


CHAPTER 20. Pricing Exotic Options 215 


(Tx) = (x- KJ 


v(t,0) = O T 


Figure 20.4: Initial and boundary conditions. 


If we let L—+0o we obtain the classical Black-Scholes formula 


b rJT oVT 
pT a b rJ/T oVT 
etn i-w( 4-87, 0) 


= S(0)N (re ae +) 


oVT K oO 2 
SeT TN i log 20) + oe ee *) : 
oVT K o 2 


If we replace T’ by 7’ — ¢ and replace S(0) by x in the formula for v(0,.5(0)), we obtain a formula 
for v(t, z), the value of the option at the time ¢ if S(#) = 2. We have actually derived the formula 
under the assumption x < Kk < JL, but a similar albeit longer formula can also be derived for 
K < «a < L. We consider the function 


v(t, 2) = le") ($(7) = K)*1s-(ry<13| . OStST 0K eek. 
This function satisfies the terminal condition 
w(T,a)=(e@-K)*, 0O<2<b 


and the boundary conditions 


We show that v satisfies the Black-Scholes equation 


rutuytrev; 4 $072" Ure, 0<t<T,O0<a< FL. 


216 


Let S(0) > 0 be given and define the stopping time 
T= min{t > 0; S(t) = L}. 
Theorem 2.61 The process 
eTDn(tAT, S(EAT)), O<t<T, 
is a martingale. 


Proof: First note that 
of) <he37>f7. 


Let w € 2 be given, and choose t € [0, 7]. If T(w) < t, then 


TE e? (S(T) = K)* 1 esr) <r} 


F(o| oy =. 
But when 7(w) < ¢, we have 
v(itAT(w), S(EAT(w),w)) = v(t A T(w), L) = 0, 


sO we may write 


IE eS) = K)*1gs«(ry <r} 


F(o| (2) = eT) y (tA 7(w), S(LA T(w),¥)). 
On the other hand, if 7() > ¢, then the Markov property implies 


IE 


eP(S(T) — K)* ser <r3|F (| (~) 
nw [es(0) 6) 
Se ut Se) 

=e TM) (EAT, S(EAT(w),W)). 


In both cases, we have 


eTeDY(tEAT, S(tAT)) = E 


eT (S(T) — K)*1 «(ry <r} 


F()| 
Suppose 0 < u<t< 7. Then 


IE Jenene AT, SLA TIF) 


=i E e? (S(T) = K)* 1 gsr) <r} 


F(b)| [Fo] 
F(u)| 


= Ele" (S(T) — K)*1ysecryery 


= eT!) (uArt, S(UAT)). 


CHAPTER 20. Pricing Exotic Options 217 


For 0 < ¢ < T’, we compute the differential 


d (e-Tv(t, $())) =e" (—rv + 1 + PSvy + 4078 0p0) dt + "oS vy dB. 
Integrate from 0 tot AT: 


eT (t Ar, S(EA T)) = v(0, (0) 


tat 
= 122 
+f ™(_rutu+trSv,z 4 50°S Urn) du 


tat 
+ i; e '“oSv, dB. 
0 
A stopped martingale is still a martingale 


Because e~"'7)u (t A 7, S(t A 7)) is also a martingale, the Riemann integral 


taAtr 
v e“(-rutut+rSvz4 40°87 Uz2) du 
is a martingale. Therefore, 
—rv(u, S(u)) + %(u, $(u)) + rS(u)ve(u, $(u)) + $07S?(u)vre(u, S(u))=0, O< u<tAr. 


The PDE 
rutuytrev;4 Bo Eta = 0, 0<t<7T,0<a¢< Ff, 


then follows. 


The Hedge 
d (e-"v(t, S(t) = eo S(t)ue(t, S() dB), OS t<T 


Let X(t) be the wealth process corresponding to some portfolio A(t). Then 


d(e"'X (t)) =e" A(t)a S(t) dB(t). 


We should take 
X (0) = v(0,5(0)) 
and 
A(t) (G50), US9 XT Ae 
Then 


A(TAF) SVT Ne S(TAZ)) 
_ jo(?, S(£)) = (SP) - K)t fr > T 
= v(t, L)=0 if7 <7. 


218 


v(T, x) 

0 K L x 
v(t, x) 

0 K L x 


Figure 20.5: Practial issue. 


20.3 A practical issue 


For t < T but ¢ near 7’, v(t, «) has the form shown in the bottom part of Fig. 20.5. 
In particular, the hedging portfolio 
A(t) = v(t, $(6)) 


can become very negative near the knockout boundary. The hedger is in an unstable situation. He 
should take a large short position in the stock. If the stock does not cross the barrier L,, he covers 
this short position with funds from the money market, pays off the option, and is left with zero. If 
the stock moves across the barrier, he is now in a region of A(t) = v,(t, .5(£)) near zero. He should 
cover his short position with the money market. This is more expensive than before, because the 
stock price has risen, and consequently he is left with no money. However, the option has “knocked 
out’, so no money is needed to pay it off. 


Because a large short position is being taken, a small error in hedging can create a significant effect. 
Here is a possible resolution. 


Rather than using the boundary condition 
oli, Eb) = 0, 0< t.< T, 
solve the PDE with the boundary condition 
v(t, L) + aL, (t,L)=0, 0<t<f, 


where a is a “tolerance parameter”, say 1%. At the boundary, Lv,(t, L) is the dollar size of the 
short position. The new boundary condition guarantees: 


1. Lv,(t, L) remains bounded; 


2. The value of the portfolio is always sufficient to cover a hedging error of a times the dollar 
size of the short position. 


Chapter 21 


Asian Options 


Stock: 
dS(t) =rS(t) dt + oS(t) dB(t). 


van( {se i) 


Payoff: 


Value of the payoff at time zero: 


X(0)=E lem ( [se ‘) 


Introduce an auxiliary process Y (t) by specifying 


With the initial conditions 


we have the solutions 


Define the undiscounted expected payoff 
u(t,z,y) = BAY (T)), O<t<T,2>0, ye R. 


219 


220 
21.1 Feynman-Kac Theorem 


The function u satisfies the PDE 
Up FU + 40727 Upy + Ty =0, 0<t<7T,2¢>0,yeER, 


the terminal condition 
u(T,c,y)=h(y), 220, ye R, 


and the boundary condition 


u(t,0,y)=Aly), O<t<T,yER. 


Y (5, [sew du) 


Ul t5) = erat HU). 


One can solve this equation. Then 


is the option value at time ¢, where 


The PDE for wv is 


ruto,t+ rev, $072 Une + Vy = 0, (1.1) 


o(T, x,y) = h(y), 
v(t,0,y) = eT F-D p(y). 


One can solve this equation rather than the equation for zw. 


21.2 Constructing the hedge 


Start with the stock price 5(0). The differential of the value X (t) of a portfolio A(t) is 


dX =AdS+r(X — AS) dt 
= AS(r dt +o dB)+rxX dt —rAS dt 
=AoSdB4+rxX dt. 


We want to have 


so that 


CHAPTER 21. Asian Options 221 


The differential of the value of the option is 


t 
dv c s(t), | S(u) du) = vdt + vedS + vyS dt + tur. dS dS 
0 


= (H+ rSvz + Svy 4 eka ie dt + oSv, dB 
= ro(t, S(t)) dt + vz(¢, S(t)) o S(t) dB(t). (From Eq. 1.1) 


Compare this with 
dX (t) = rX(t) dt + A(t) o S(t) dB(t). 


Take A(t) = v,(t, $(t)). If X(0) = v(0,.(0), 0), then 


t 
XO=% (+, so, f S(u) du) <tc, 
0 
because both these processes satisfy the same stochastic differential equation, starting from the same 


initial condition. 


21.3 Partial average payoff Asian option 


T 
van(f sty), 
where 0 < 7 < T. We compute 


v(7, 2, y) = ET 4%" FT -DA(y (T)) 


Now suppose the payoff is 


just as before. For 0 < ¢ < 7, we compute next the value of a derivative security which pays off 
v(t, S(T), 0) 
at time 7. This value is 
w(t, 2) = Ete" u(r, S(r), 0). 
The function w satisfies the Black-Scholes PDE 


rw+wetrew, 4 BO eWay 0; Ob Se; 


with terminal condition 


and boundary condition 


The hedge is given by 


222 


Remark 21.1 While no closed-form for the Asian option price is known, the Laplace transform (in 
the variable orp — t)) has been computed. See H. Geman and M. Yor, Bessel processes, Asian 
options, and perpetuities, Math. Finance 3 (1993), 349-375. 


Chapter 22 


Summary of Arbitrage Pricing Theory 


A simple European derivative security makes a random payment at a time fixed in advance. The 
value at time t of such a security is the amount of wealth needed at time ¢ in order to replicate the 
security by trading in the market. The hedging portfolio is a specification of how to do this trading. 


22.1 Binomial model, Hedging Portfolio 


Let 22 be the set of all possible sequences of n coin-tosses. We have no probabilities at this point. 
Letr >0, u>r+41, d=1/ube given. (See Fig. 2.1) 


Evolution of the value of a portfolio: 
Xp = ApSpzi + (1+ r)(X~ — Axis). 


Given a simple European derivative security V (w1,w2), we want to start with a nonrandom X and 
use a portfolio processes 


Ao, Ai(H), Ai(T) 


so that 


Xo(wi,w2) = V(wi,w2) Vwi,we. (four equations) 


There are four unknowns: Xo, Ao, Ai(/), Ai(Z). Solving the equations, we obtain: 


223 


224 


1 _ T+ 
X1(w1) = er — X9(w1, H)+ aces Xo(w1,T)] , 
V(01,H) V(W1,T) 
tl Sd u—(1+r) 
Hai u—d aE) cr u—d Ai) 
_ Xo(w1, 4) — X2(w1,T) 
Dae) So(a1, H) — So(wi,T) 
a, — XD) = AP) 
° Si (A) — SP) 


The probabilities of the stock price paths are irrelevant, because we have a hedge which works on 
every path. From a practical point of view, what matters is that the paths in the model include all 
the possibilities. We want to find a description of the paths in the model. They all have the property 


2 
(log Sis ~ log Si)? = (log E+) 
k 


= (tlogu)? 
= (log u)?. 
Let o = log u > 0. Then 
n—-1 
S— (log Sr41 — log Si) =o’n. 
k=0 


The paths of log $;, accumulate quadratic variation at rate o? per unit time. 


If we change u, then we change o, and the pricing and hedging formulas on the previous page will 
give different results. 


We reiterate that the probabilities are only introduced as an aid to understanding and computation. 
Recall: 


Xpqi = ApSega + (1+ 1) (X;, — Ag Se). 


Define 
By =(1+r)* 
Then 
Xp Agee ee ees rly 
Beva Broi Br Bp 
i.e., 


Xrpi Xk = ( =) 
Broi Br Gea es” 


In continuous time, we will have the analogous equation 


d (sa) ihe (Fa) . 


CHAPTER 22. Summary of Arbitrage Pricing Theory 225 


If we introduce a probability measure PP under which Dk is a martingale, then Sk will also be a 
martingale, regardless of the portfolio used. Indeed, 


Fa] = [Fe + ae (Gt FE) |e 
=F (BF 


Xk 
Praa 


= 


Suppose we want to have X2 = V, where V is some F2-measurable random variable. Then we 
must have 


ring Bla PEL 
xo= 22-8 [%) -[Y). 


To find the risk-neutral probability measure IP under which Dk is a martingale, we denote p = 
Pio, = 1}, = P{w, = T}, and compute 


~ oo 


Sk uel aa 
+ qd 
Prot 


qa>— 
Br+A Prot 


F, = pu 


We need to choose 7 and ¢ so that 


put+qd=1+r, 
p+q=l. 


The solution of these equations is 


22.2 Setting up the continuous model 


Now the stock price S(t),0 < ¢ < T, is a continuous function of t. We would like to hedge 
along every possible path of S(t), but that is impossible. Using the binomial model as a guide, we 
choose o > 0 and try to hedge along every path S(t) for which the quadratic variation of log S(¢) 
accumulates at rate 7” per unit time. These are the paths with volatility ?. 


To generate these paths, we use Brownian motion, rather than coin-tossing. To introduce Brownian 
motion, we need a probability measure. However, the only thing about this probability measure 
which ultimately matters is the set of paths to which it assigns probability zero. 


226 


Let B(t),0 < t < T, be a Brownian motion defined on a probability space (Q, 7, P). For any 
p © IR, the paths of 


pt + oB(t) 
accumulate quadratic variation at rate 7? per unit time. We want to define 
S(t) = $(0) exp{ot + oB(U)}. 


so that the paths of 
log S(t) = log S(0) + pt + co Bit) 


accumulate quadratic variation at rate o? per unit time. Surprisingly, the choice of p in this definition 
is irrelevant. Roughly, the reason for this is the following: Choose w; € Q. Then, for p; € R, 


pit+oB(t,w1), O<t<T, 


is a continuous function of ¢. If we replace p; by p2, then pot + oB(t,w1) is a different function. 
However, there is an w2 € 2 such that 


pit +o B(t,w1) = pot +o Blt,w2), O<t<T. 


In other words, regardless of whether we use /, or /2 in the definition of S(t), we will see the same 
paths. The mathematically precise statement is the following: 


If a set of stock price paths has a positive probability when S(t) is defined by 
S(t) = S(0) exp{pit + oB(t)}, 

then this set of paths has positive probability when S(£) is defined by 
S(t) = $(0) exp{pot + a B(t)}. 


Since we are interested in hedging along every path, except possibly for a set of paths 
which has probability zero, the choice of p is irrelevant. 


The most convenient choice of p is 


= ey 
p=r— 57 , 


S(t) = $(0) exp{rt + oB(t) — 407th, 


and 
e~"' S(t) = $(0) exp{oB(t) — $07t} 


is a martingale under JP. With this choice of p, 


dS(t) = rS$(t) dt + 0S (t) dB(t) 


CHAPTER 22. Summary of Arbitrage Pricing Theory 


and JP is the risk-neutral measure. If a different choice of p is made, we have 


S(t) = 5(0) exp{pt + oB(i)}, 
dS(t) = (p+ $07) S(t) dt + oS(t) dB(t). 


bb 
= rS(t) dt-+o | S*dt + dB(O)]. 
—————— 

dB(t) 


227 


B has the same paths as B. We can change to the risk-neutral measure IP, under which B is a 


Brownian motion, and then proceed as if p had been chosen to be equal to r — 5 


22.3 Risk-neutral pricing and hedging 


Let JP denote the risk-neutral measure. Then 
dS(t) = rS(t) dt + S(t) dB(t), 


where B is a Brownian motion under /P. Set 


b(t) =e" 
Then S(t) s(t) 
d (sa) = ay iB) 


so ot is a martingale under P. 
Evolution of the value of a portfolio: 
dX (t) = A(t)dS(t) + r( X(t) — A(t) S(6)) dé, 


which is equivalent to 


i) 
B(t) 


Regardless of the portfolio used, aa is a martingale under P. 


= A(t) dB(t). 


(3.1) 


(3.2) 


Now suppose V is a given F(7')-measurable random variable, the payoff of a simple European 
derivative security. We want to find the portfolio process A(7’),0 < t < 7, and initial portfolio 


value X (0) so that X (7') = V. Because - must be a martingale, we must have 


a5 -Elam|o], ieee 


This is the risk-neutral pricing formula. We have the following sequence: 


(3.3) 


228 


1. V is given, 
2. Define X (t),0 < t < T, by (3.3) (not by (3.1) or (3.2), because we do not yet have A(t)). 


3. Construct A(t) so that (3.2) (or equivalently, (3.1)) is satisfied by the X(t),0 < t < T, 
defined in step 2. 


To carry out step 3, we first use the tower property to show that aH defined by (3.3) is a martingale 


under IP. We next use the corollary to the Martingale Representation Theorem (Homework Problem 
4.5) to show that 


d (=) = 7(t) dB(t) (3.4) 


for some proecss y. Comparing (3.4), which we know, and (3.2), which we want, we decide to 
define 


A(t) = ne. (3.5) 


Then (3.4) implies (3.2), which implies (3.1), which implies that X (t),0 < ¢ < T, is the value of 
the portfolio process A(t),0 <t <7. 


From (3.3), the definition of X, we see that the hedging portfolio must begin with value 


X00) =E Eaik 


and it will end with value 


V | V 

—|F (7)| = o(7?)— = V. 

aay | = PO ae 

Remark 22.1 Although we have taken r and o to be constant, the risk-neutral pricing formula is 
still “valid” when r and o are processes adapted to the filtration generated by B. If they depend on 
either B or on S, they are adapted to the filtration generated by B. The “validity” of the risk-neutral 
pricing formula means: 


X(T) = 6(2)E | 


1. If you start with 


then there is a hedging portfolio A(t),0 < t < 7, such that X(T) = V; 
2. At each time t, the value X (t) of the hedging portfolio in 1 satisfies 


AN) Faia 


BG) LB(Z) 


Remark 22.2 In general, when there are multiple assets and/or multiple Brownian motions, the 
risk-neutral pricing formula is valid provided there is a unique risk-neutral measure. A probability 
measure is said to be risk-neutral provided 


CHAPTER 22. Summary of Arbitrage Pricing Theory 229 


e it has the same probability-zero sets as the original measure; 


e it makes all the discounted asset prices be martingales. 


To see if the risk-neutral measure is unique, compute the differential of all discounted asset prices 
and check if there is more than one way to define B so that all these differentials have only dB 
terms. 


22.4 Implementation of risk-neutral pricing and hedging 


To get a computable result from the general risk-neutral pricing formula 


aa baal): 


one uses the Markov property. We need to identify some state variables, the stock price and possibly 
other variables, so that 


V 
X(0 = 60F aa |FO) 
is a function of these variables. 


Example 22.1 Assume r and o are constant, and V = h(S(T)). We can take the stock price to be the state 
variable. Define 


v(t,2) = BE" Je"? 4(5(7))] 


Then 
X(t) =e" E [ertasery|00] 
= v(t, 5), 
and au = e~"ty(t, S(t)) is a martingale under P. Hi 


Example 22.2 Assume r and ¢ are constant. 


vaa( {sw is). 


Take S(t) and Y (t) = a S(u) du to be the state variables. Define 


0 


~ try 


v(t, 2,y) = BE Je"? -OA(Y(7))] ) 


where 


y(t)=u+ | S(u) du. 


230 


Then 
Xie Je | “Tu siry|Feo) 
= v(t, S(t), ¥ (8) 
and x(t) 
a = e~"*u(t, S(t), Y (8) 
is a martingale under P. | 


Example 22.3. (Homework problem 4.2) 
dS(t) = r(t, Y(t) S(t)dt + o(t, ¥(t))S(t) dB(A), 
dY (t) = a(t, Y(t)) dt + y(t, Y(t) dB(A), 
V =A(S(T)). 
Take S(t) and Y (t) to be the state variables. Define 


v(t,2,y)- BE" on f r(u, ¥(u)) eu} nsery 
a, 


rea) 
Then 
x(o = ay ACE lio) 
= Boxe - r(u, ¥ (u)) ub nserphro] 
= v(t, S(t), ¥ (¢)), 
and 
a) = ex = - U U UPD 
SO = ex {- [ rwriw) dub oe seo.reny 
is a martingale under P. | 


In every case, we get an expression involving v to be a martingale. We take the differential and 
set the di term to zero. This gives us a partial differential equation for v, and this equation must 
hold wherever the state processes can be. The dB term in the differential of the equation is the 
differential of a martingale, and since the martingale is 


X(t) _ ties OD yen 
ay 7X4 [Ae Fay Bt 


we can solve for A(t). This is the argument which uses (3.4) to obtain (3.5). 


CHAPTER 22. Summary of Arbitrage Pricing Theory 231 
Example 22.4 (Continuation of Example 22.3) 


a2 ati {- »(ui, VG) an} v(t, S(t), ¥ (0) 
ae Er eee 


is a martingale under IP. We have 


a!) Bs ts —r v 
(30) = “all (t, ¥ (v(t, S(), ¥ () dt 


+ u-dt + ved + vydY 


+ durrdS dS + veydS dY + doyyd¥ d¥ 


= mak ry +, $rSvy + avy 4 50° 8? Ure + OYSU zy + $7" yy) dt 
+ (oSvy + Wy) dB 


The partial differential equation satisfied by v is 


22 122 
LUpe t OYLUgy + ZY Vyy = 9 


rv + V_ FPLVe + Vy 40 


where it should be noted that vy = w(t, , y), and all other variables are functions of (¢, y). We have 


a) = oe Vv Vv B 
* Gay) = maglesee + a8. 


teen a(t, Y(t)),y = y(t, Y@)), v = v(t, S(t), Y @)), and S = S(£). We want to choose A(t) so that 
| X()) _ SW) ap 
d (50) = A(t)a(t, Y(t) dB(t). 


Therefore, we should take A(t) to be 


A(t) = vz (t, S(t), ¥()) + 


232 


Chapter 23 


Recognizing a Brownian Motion 


Theorem 0.62 (Levy) Let B(t),0 < t < T, be a process on (Q,F,P), adapted to a filtration 
F(t),0 <t < T, such that: 


1. the paths of B(t) are continuous, 

2. B isa martingale, 

3. (B)(t) =t,0 <t < T, (ie, informally dB(t) dB(t) = dt). 
Then B is a Brownian motion. 
Proof: (Idea) Let 0 < s < t < T be given. We need to show that B(t) — B(s) is normal, with 
mean zero and variance t — s, and B(t) — B(s) is independent of F(s). We shall show that the 


conditional moment generating function of B(t) — B(s) is 


E [e2-B) 


12 
Fis) Sep) 


Since the moment generating function characterizes the distribution, this shows that B(t) — B(s) 
is normal with mean 0 and variance t — s, and conditioning on ¥(s) does not affect this, ie., 
B(t) — B(s) is independent of F(s). 


We compute (this uses the continuity condition (1) of the theorem) 


de"BO = ye"POAB(t) + 4u2e"POdB(t) dB(t), 
NiO) 


t t 
et Blt) = etB(s) af ) yevPle) dB(v) sit Lv? | et B(v) do, 
. uses cond. 3 


233 


234 


Now f} we“? )dB(v) is a martingale (by condition 2), and so 


t 
E i ue"B) dB(v) 


s 


Fis) 


$s t 
= - | ue"PdB(v) + E / ue"BOdB(v) 
0 0 


Fis) 


It follows that 


Berri] =e 4 be fw fero|r(sy] av 
We define 
ov) = E [el F()), 
so that 
(5) = evBO) 
and 


Plugging in s, we get 


Therefore, 


CHAPTER 23. Recognizing a Brownian Motion 
23.1 Identifying volatility and correlation 


Let B, and By be independent Brownian motions and 


dS 
la =rdtt+ O11 dB, + 012 dB, 
Sy 
dS 
ise =rdtt+ O21 dB, + 0922 dB, 
Se 
Define 
01 = VOI + Fa, 
02 = 93 + Op, 
oe F11F21 + 712022 


0102 
Define processes W, and W2 by 
011 dB, + o42 dB 
oj 
o21 dB, + 022 dBy 
(om) : 


dW, = 


dW, = 


Then W, and W2 have continuous paths, are martingales, and 


1 
dW, dW, = Gz (oud Bi + o42dB2)* 
1 
1 
= (oid Bi dB, + o7,d Bo dB) 
1 
= dt, 
and similarly 
dW. dW. = dt. 
Therefore, W, and W. are Brownian motions. The stock prices have the representation 
ads 
— = rdt+o; dM, 
1 
d 
da =rdtt+ 02 dW. 


2 
The Brownian motions W, and W», are correlated. Indeed, 


1 
dW, dW = 0109 (o41dB, + 012d Bz) (od By, + 022d B2) 


1 
= (041021 + 012022) dt 
0102 


= pdt. 


235 


236 
23.2 Reversing the process 


Suppose we are given that 


ees =rdt+ a,dW,, 
d 

a =rdt+oodWo, 
S2 


where W, and W2 are Brownian motions with correlation coefficient p. We want to find 


Oo o 
y= 11 12 
O21 922 
so that 


yy Be Z le 2 


O21 922) |912 922 
Oe fa 9 eis 
= F141 1 F712 011921 + 912022 
= 2 2 
O11921 + 012022 031 1 992 


ve ot Po 102 
po 1o2 on 
A simple (but not unique) solution is (see Chapter 19) 


O11 = 91, 12 = 0,7 


O21 = po2, 022 = /1- p? O2. 


This corresponds to 
o, dW, = o,dB\—dB, = dW, 


02 dW, = por dB, + 1 — p?o2 dBy 


dW, — p dW, 
He, (o # £1) 
ee 


= dB, = 


If p = +1, then there is no By and dW2 = p dB, = p dW. 


Continuing in the case p # +1, we have 


dB, dB, = dW, dW, = dt, 


1 
(By dBy = —— (diz dW2 — 2p dW, dW + p*dW. dW>) 


=" 7 (at — 2p dt + p? dt) 


= dt, 


CHAPTER 23. Recognizing a Brownian Motion 237 


so both B,; and By are Brownian motions. Furthermore, 


1 


Trap | 
1 


dB, dBy = dW, dW ai pdW, dW) 


We can now apply an Extension of Levy’s Theorem that says that Brownian motions with zero 
cross-variation are independent, to conclude that B;, B, are independent Brownians. 


238 


Chapter 24 


An outside barrier option 


Barrier process: 


=X dt + O71 dB, (t). 


Stock process: 


S 
GA TH dt + poz dBy(t) + 4/1 — p? o2 dBa(t), 


where 0; > 0, o2 > 0, -—1 < p < 1, and By, and By are independent Brownian motions on some 
(Q, F, P). The option pays off: 

(S(T) — K)* lye ryexy 
at time 7’, where 


0<S(0)< Kk, 0< Y(O)< EF, 


Remark 24.1 The option payoff depends on both the Y and S processes. In order to hedge it, we 
will need the money market and two other assets, which we take to be Y and S. The risk-neutral 
measure must make the discounted value of every traded asset be a martingale, which in this case 
means the discounted Y and 5S processes. 


We want to find @; and @2 and define 


dB, = 6, dt+dB,, dBy = 62 dt +dBy, 


239 


240 


so that 
dy ~ 
YY =rdt+ o,dB, 
=rdt+ a9, dt + O71 dB, 


d pa ~ 
2 =rdt + poz dBy + /1— p? o2dBy 


=rdt+ poz 6, dt + 1/1 — p? 0262 dt 
+ po2 dB, + 1— p? 092 dBo. 


We must have 


A=r+o,6,, (0.1) 
fe = 7+ por, + 4/1 — p? 0249. (0.2) 
We solve to get 
6, = A ZB 
oO} 
9, — Han pant 


V1 p? 02 , 


We shall see that the formulas for #; and 42 do not matter. What matters is that (0.1) and (0.2) 
uniquely determine @, and #2. This implies the existence and uniqueness of the risk-neutral measure. 
We define 


A(T) = exp {-6, By(T) — 62B2(T) — $(6; + 63)T}, 


P(A) = I, Z(T) dP, VAEF. 


Under IP, B, and By are independent Brownian motions (Girsanov’s Theorem). IP is the unique 
risk-neutral measure. 


Remark 24.2 Under both IP and IP, Y has volatility 71, S has volatility oz and 


dY dS di 
—— = po\o 

YS Pp 1 2 ‘) 
: : dY dS ; 
i.e., the correlation between 4 and “> is p. 
The value of the option at time zero is 


»(0,.5(0),¥(0)) = B fe" (S(L) — K) (ry <1)) - 


We need to work out a density which permits us to compute the right-hand side. 


CHAPTER 24. An outside barrier option 


Recall that the barrier process is 


dy ~ 
Sor ae dB, 


SO 


Y (t) = Y (0) exp {rt + 0B, (t) - soit} : 


Set 
6=r/o, — 01/2, 
B(t) = 6t + By (0), 
M(T) = max, BO) 
Then 


Y (t) = Y (0) exp{o1B(t)}, 
Y*(T) = Y (0) exp{o,M(T)}. 


The joint density of B(T) and M(T), appearing in Chapter 20, is 
IP{B(T) € db, M(T) € din} 


_ 2(2rn — 6b) (21 — b)? vy) ee 
= — + 6) — LPT) db din, 
TV2nT pf 2T 


m > 0, b<m. 
The stock process. 
° =rdt+ po2dB, + 4/1-— p? od By, 
sO 
S(T) = S(0) exp{rT + po. B,(T) — tp soe Ji-p? 0 By(T 
= $(0) exp{rT — 403T + po, B,(T) +1/1— p? o2B,(T)} 
From the above paragraph we have 
Bi(T) = -6T + B(T), 
sO 


S(T) = S(0) exp{rT + poy B(T) — $057 — po26T +4/1— p? o2B2(T)} 


— p*) 


oT} 


241 


242 


24.1 Computing the option value 


»(0, (0), ¥(0)) = B le" (S(7) — K) 1 -crycry| 


aes ae ~ oe + 
= <'TE| (5 (0) exp { (r - $03 = poa8)T + ponB(L) + \/1 — p? anBa(t)} - K) 


iyo) sie 


We know the joint density of (B(T’), M(T)). The density of B(T) is 
IP{B,(T) € db} = ee Be db, bER 
2 = ora Pp oT ’ : 


Furthermore, the pair of random variables (B(T), M(T)) is independent of By (7) because By 2 and 
Bz are independent under IP. Therefore, the joint density of the random vector (B2(T), B(T), M(T)) 
is 


IP{B,(T) € db, B(T) € db, M(T) € din, } = IP{B2(T) € db}. IP{B(T) € db, M(T) € din} 
The option value at time zero is 


v(0,5(0), ¥(0)) 


1 L 
core log YO) *% c 


ue a . + 
=et ff f (scoyexp{ $03 - pord)t + poab-+ Y= pond} - x) 


.db db din. 
The answer depends on 7',S(0) and Y (0). It also depends on 01,02, p,7r, K and L. It does not 
depend on A, js, 6,, nor 6). The parameter @ appearing in the answer is 0 = a7 a. 


Remark 24.3 If we had not regarded Y as a traded asset, then we would not have tried to set its 
mean return equal to r. We would have had only one equation (see Eqs (0.1),(0.2)) 


w=rt por; + 1/1 — p? 7262 (1.1) 


to determine 6, and @2. The nonuniqueness of the solution alerts us that some options cannot be 
hedged. Indeed, any option whose payoff depends on Y cannot be hedged when we are allowed to 
trade only in the stock. 


CHAPTER 24. An outside barrier option 243 


If we have an option whose payoff depends only on S, then Y is superfluous. Returning to the 
original equation for 5S, 


d 
oo = pat + por dB, + 4/1 — p? a2 dBo, 
we should set 
dW = P dB, + 1- p?dBao, 
so W is a Brownian motion under JP (Levy’s theorem), and 


d 
2 = pdt + oodW. 


Now we have only Brownian motion, there will be only one #, namely, 


a 
02 


so with dW = 6 dt + dW, we have 
d eee 
° =rdt+oa, dW, 


and we are on our way. 


24.2 The PDE for the outside barrier option 


Returning to the case of the option with payoff 
(S(T) — K)*1 gener}, 
we obtain a formula for 
OE Cy) = ert (S(7) - Te EN ease ¥(u) < 1} 


by replacing 7, S(0) and Y (0) by T —t, x and y respectively in the formula for v(0, 5 (0), Y(0)). 
Now start at time 0 at 5(0) and Y (0). Using the Markov property, we can show that the stochastic 


process 
eMu(t, 5(0), YO) 


is a martingale under IP. We compute 


dfevt, 8, Y(0)] 


Saal rut utrSvg+rY vy, 4 4038 nn + po1T2SY voy + 40TY *vyy) ae 


+ por Svy dB, +4/1= p? o2SVz dBo + a1Y vy dB, 


244 


VE we Ly SO xe S=0 


v(t, 0, 0) =0 


Figure 24.1: Boundary conditions for barrier option. Note that t € [0,1 is fixed. 


Setting the dt term equal to 0, we obtain the PDE 


PU HUE HE PEVe HF LYVy + 5098 Vee 
+ POLIQLYVyy + LOTY Vyy = 0, 
O0<t<7T, «>0, O<y<L. 


The terminal condition is 


w(Tf,2,y)=(2-—K)*, 2«>0,0<y<L, 


and the boundary conditions are 


v(t,0,0)=0, 0<t<T, 
ola tyval, US baT, «> 0, 


CHAPTER 24. An outside barrier option 245 


1,2,2 12,2 
ru + ve + ryvy + 5071y Vyy = 0 rU + UF TEV, + 509" Ure = 0 


This is the usual Black-Scholes formula 
ins. 


This is the usual Black-Scholes formula 
in y. 


The boundary condition is 

v(t,0,0) = e-"7-9(0 — K)+ = 0; 
the terminal condition is 
wT 2, 0pS(e— KR), 2 0. 


The boundary conditions are 

v(t, 0, £) = 0, v(t, 0,0) = 0; 

the terminal condition is 
vo(T,0,y)=(0-K)t=0, yO. 


On the y = 0 boundary, the barrier is ir- 
relevant, and the option value is given by 
the usual Black-Scholes formula for a Eu- 
ropean call. 


On the z = 0 boundary, the option value 
isv(t,0,y)=0, O<y<L. 


24.3. The hedge 


After setting the dé term to 0, we have the equation 
d[ev(t, S(t), ¥)] 


Ze" lpoasu dBy + y/1—p? 028v, dBy + a1 vydBs| ; 


where v, = vy(t, S(t), Y(t), vy = vy(t, S(t), Y(t), and By, By,.S,¥ are functions of ¢. Note 
that 


d fe"! S(t)] =e" [-r8(t) dt + d5(0)] 
at a pons dB, (t) + 1/1 — p? o2S(t) dBo (| , 
d[e"'¥ (t)] =e" [-r¥ (t) dt + a (0)] 
= e~"'a1Y (t) dBy(t). 
Therefore, 


d [e~"*v(t, S(t), ¥()] = ved[e"'S] + vydfew"'Y]. 


Let A2(t) denote the number of shares of stock held at time ¢, and let A;(t) denote the number of 
“shares” of the barrier process Y. The value X (t) of the portfolio has the differential 


246 


This is equivalent to 
d[e~"'X (t)] = Ao(t)d[e~S (t)] + Ai(t)d[e7"Y (t)]. 
To get X(t) = v(t, S(t), Y (4) for all t, we must have 
X (0) = v(0,5(0), ¥(0)) 


and 


Chapter 25 


American Options 


This and the following chapters form part of the course Stochastic Differential Equations for Fi- 
nance II. 


25.1 Preview of perpetual American put 


dS=rSdt+oaS dB 


Intrinsic value at time t : (KK — S(t))T. 
Let L € [0, K] be given. Suppose we exercise the first time the stock price is L or lower. We define 
ry, = min{t > 0; S(t) < L}, 
vp(z) = He"! (K — S(rr))* 
K-2 ifz< JL, 
ie — L)Ee-"7t ifa > L. 


The plan is to comute vz (x) and then maximize over L to find the optimal exercise price. We need 
to know the distribution of 77,. 


25.2 First passage times for Brownian motion: first method 


(Based on the reflection principle) 


Let B be a Brownian motion under JP, let z > 0 be given, and define 
Feminit 0; Bt) Hs). 
T is called the first passage time to x. We compute the distribution of 7. 


247 


248 


. Intrinsic value 


K Stock price x 


Figure 25.1: Intrinsic value of perpetual American put 


Define 


M(t) = a B(u). 


From the first section of Chapter 20 we have 
2(2m — b 2m — b)? 
IP{ M(t) € dm, B(t) € db} = CrP exp | naar dmdb, m>0,b<m. 


Therefore, 


Now 


CHAPTER 25. American Options 249 


SO 
0 


a 
= SP {M(t) > x} dt 


a) p. 2 2 alia 
= |— —— exp( —-— p> dz 
Ot afi Vv 20 P 


We also have the Laplace transform formula 
Feo = i. e-IPie-€ di} 
0 
= eevee. a >0. (See Homework) 


Reference: Karatzas and Shreve, Brownian Motion and Stochastic Calculus, pp 95-96. 


25.3 Drift adjustment 


Reference: Karatzas/Shreve, Brownian motion and Stochastic Calculus, pp 196-197. 


For 0 < ¢ < ox, define 


B(t) = 6t + B(t), 


Define 
7 = min{t > 0; B(t) = x}. 


We fix a finite time 7 and change the probability measure “only up to 7”’. More specifically, with 
T fixed, define 


P(A) = / Z(T) dP, A€ F(T). 
A 
Under P, the process B (t),0 <t < 7, is a Mondrifted) Brownian motion, so 


IP{7 € dt} = P{r € dt} 


a eV oy 0<t<T 
= ——— exp 4 -— ; <7. 
bJant | 2t 


250 
For 0 < t < T we have 


PiF<H=E 


Lscx exp{0B(L) — 40°7}| 


| 
| 
| 
=I Lt geen lexp(0B(7) — Ler} FG \ || 
= FF [Veen exp{OB(F At) — 40(7 A0)}| 
| 


lI 
om 
w 
i] 
3 
w 

(a>) 

a 

xo) 
is 

| 
= 
rol | 
Ala 
ae 
wo 
SS 
Qu 


Therefore, 


(x — 6t)? 


x 
IPi.7 € dt} = ——exp<- dt, O<t<T. 
t += am P| 2t ce 


Since T' is arbitrary, this must in fact be the correct formula for allt > 0. 


25.4 Drift-adjusted Laplace transform 


Recall the Laplace transform formula for 
F=min{t> 0: Bi) =a} 


for nondrifted Brownian motion: 


0p Pe. Ju 
Ee = exp 4 —at — — }dt =e *¥7*, a>Q0,z2>0 
0 ty 2at P| = 
For 
7 =min{t > 0;6t+ Bit) =z}, 


CHAPTER 25. American Options 


the Laplace transform is 


=e oe (x — 6t)? 
Fe = | exp < —at — ————~— > di 
0 tV2zxt P| 2t 


a z t a O62 40 di 
= ——— ex a —+2 5 
oe ARE ce a 


aay 2 
a pnivaer® «as 0, 6:0, 


where in the last step we have used the formula for /e~°7 with aw replaced by a + $0", 


If 7(w) < oo, then 
lim e707) — 1; 
oo 

Ww) 


if 7(w) = 00, then e~°7“) = 0 for every a > 0, so 


lim e~@7) — Q, 
oo 


Therefore, 
lim e7°7) 


oo 


= eae . 


Letting «|0 and using the Monotone Convergence Theorem in the Laplace transform formula 


TEe7%* = cree Qa+6? 


’ 


we obtain 
IPF Pe oo} = ere-eV PF = er 9-2/4 | 
If 6 > 0, then 
IPL oh 1, 
If 6 < 0, then 


Pit <p ee <1. 


(Recall that 2 > 0). 


25.5 First passage times: Second method 


(Based on martingales) 


Let o > 0 be given. Then 
Y (t) = exp{oB(t) — $07t} 


251 


252 


is a martingale, so Y (t A T) is also a martingale. We have 
1=YOA7T) 
SVEY(tA7) 
= Eexp{oB(t Ar) — $07 (tAT)}. 
vit i? 
= Jim Eexp{aB(t Ar) — 50° (t AT)}. 
We want to take the limit inside the expectation. Since 
0 < exp{oB(t Ar) — $07(tAT)} < e*, 
this is justified by the Bounded Convergence Theorem. Therefore, 
2 : He? 
i IE lim. exp{oB(t Ar) — 50° (tA T)}. 
There are two possibilities. For those w for which r(w) < 00, 
1 
Jim exp{oB(t Ar) — 40° (tAT)} = ene s, 
For those w for which 7(w) = ov, 
im exp{oB(t Ar) — 40°(tAT)} < im exp{ox — 4o07t} = 0. 
Therefore, 


= : ee 
1=JE lim exptoB(t Ar) FO AT) 
1 
-— JE ATO OT nin. 


lee 
= FeeC*2- 2° of 


Oxr-— a Or : 
where we understand e 2 to be zero if T = oo. 


2 


Let a = 407,80 0 = 2a. We have again derived the Laplace transform formula 


2 
e *V20 — Fe°", a>O0,2>0, 


for the first passage time for nondrifted Brownian motion. 


25.6 Perpetual American put 


dS =rS dt+oaS dB 
S(O) =e 
S(t) = vexp{(r — $07)t + cB(t)} 
o 


/ 
= —-—Jjt+B(t 
rexp 40 (- =) + Bit) 
—_$ 


CHAPTER 25. American Options 253 


Intrinsic value of the put at time t: (KK — S(t))T. 
Let L € [0, K] be given. Define for x > L, 


T= mint > 0; S(t)= 1} 
1 L 
= min{t > 0; 6 + B(t) = —log —} 
e "2 


1 x 
= min{t > 0; —ét — B(t) = —log — 
min{t > 0; (t) = — log >} 
Define 


vp = (K - L) Ee"! 
6 1 
=(K-L) exp {log ee — log = Jar + a 
o Loo L 


-$-3VrF 
ae (=) ais 


We compute the exponent 


| 

| 

| 

| 
nN 
= 
D 

nN 
lI 

bole 


| 
| 
55 
| 
T 
bole 
| 
Al ale QtF Ql[R ale 
Soli 
bo bo 
+ 
3 Q/3 
+ 
re 
™— 
aS 


a 1 r 9 - 
=~gata GV lat 
r i r 
Pa oO 
2r 
~ Gt 
Therefore, 
is (Kk — 2), O0<2e<L, 
vp(x) = 
. (K-L)(E)7/", > 


The curves (K — L) Cees , are all of the form C'a~?"/7 


We want to choose the largest possible constant. The constant is 


Ca(K=Di?. 


254 


2 
(K -L) (tye 


K Stock price x 


Figure 25.2: Value of perpetual American put 


value 
=> 


Stock price x 


Figure 25.3: Curves. 


CHAPTER 25. American Options 255 


and 
OC ae 2r 1 
—-=-f[o —(KkK —-L)Le& 
OL rr moa ) 
2r 2r 1 
= Lo |-14+ —(k —- L)— 
[14+ SUK - 1)5| 
27. 2r or K 
—f[o —{] — —_— 
| ( +5) a4 
We solve 
(1+) ae, 
o? ot L 
to get 
_ ork 
gt 4 2r° 
Since 0 < 2r < 0? + 2r, we have 
O0< L< K. 


Solution to the perpetual American put pricing problem (see Fig. 25.4): 


(x) (Kk — 2), O0<2e<L', 
v(z) = 
oe ne ee SE, 
where . 
» 2rk 
ot 4 2r- 
Note that 
ee —1, 0O<a<L', 
i ~2(K _ Ly Ce ek x> L*. 
We have 
1 
lim v'(e) = -2-5(K - Lb") 
vL* Oo c* 
—. r (x 2rk ) o2 +2r 
aa co? o2 +2r ork 
9 r fo*+2r—2r\ o% + 2r 
o? o*+2r 2r 
=-l 


| 

. 
QL 
— 

8 
— 


256 


(K - L*)(x/L") 


L K Stock price x 
Figure 25.4: Solution to perpetual American put. 


25.7 Value of the perpetual American put 


Set Z 
2r ork 


* Y fs 
—— L* = —— = — Kk. 
om gee o2?42r yti1 


If0<« < L*, then v(z) = K — a. If L* < & < o, then 


v(x) = (K — L*)(L*)’27 
C 


SBP le (Ka Eee 5 
where 
S(0) =2 
FSmitd? > 0; St) ="): 
IfO0 <a < L*, then 


x -2r/62 


—rv(z) + rev'(2z) + $0%270"(2) = -r(K —2)+re(-l) = 
If L* < a < co, then 


—rv(x) + rav (x) + $0727 0" (2) 
2 


In other words, v solves the linear complementarity problem: (See Fig. 25.5). 


(7.1) 


(7.2) 


(7.3) 
(7.4) 


CHAPTER 25. American Options 257 


K iY 


Figure 25.5: Linear complementarity 


Forallz € R,« 4 L*, 


rv — rev — soe U" > 0, (a) 
vw > (K-22), (b) 
One of the inequalities (a) or (b) is an equality. (c) 


The half-line [0, 00) is divided into two regions: 


C = {2; v(2) > (K - 2)*}, 
S = {2; rv—rav' — $0727v" > 0}, 
and L* is the boundary between them. If the stock price is in C, the owner of the put should not 


exercise (should “continue’’). If the stock price is in S or at L*, the owner of the put should exercise 
(should “stop”). 


25.8 Hedging the put 


Let (0) be given. Sell the put at time zero for v(5(0)). Invest the money, holding A(t) shares of 
stock and consuming at rate C'(t) at time t. The value X (t) of this portfolio is governed by 


dX (t) = A(t) dS(t) + r( X(t) — A()S(t)) dt — C(t) dt, 
or equivalently, 


d(e“'X (t)) = -e7" C(t) dt +e" A(t)a S(t) dB(t). 


258 


The discounted value of the put satisfies 


d (e"'v(S(t))) = e-" [-rv( S(t) + rS()v'(S(H) + 4075" (e)v"(S()] at 
+e"'o§(t)v'(S(t)) dB(t) 
= —r Ke" 1saycrsydt + eo S(t)v'(S(t)) dB(t). 

We should set 

C(t) = rh AL isiy<by; 

A(t) = v'(S(t)). 
Remark 25.1 If S(t) < L*, then 

v(S(t))=K — S(t), A(t) =v'(S(t)) = -1. 


To hedge the put when S(t) < L*, short one share of stock and hold Kk in the money market. As 
long as the owner does not exercise, you can consume the interest from the money market position, 
1.€., 

C(t) = rh Asc} 


Properties of e~"'v(S(t)): 
1. e~"'v(.S(t)) is a supermartingale (see its differential above). 
2.6 “i(S()) 2 e 1K =S0))", OR t<- oo; 
3. e~"'v(S(t)) is the smallest process with properties | and 2. 
Explanation of property 3. Let Y be a supermartingale satisfying 
Y@>e"(K -S(t))t, 0<t<o. (8.1) 
Then property 3 says that 
Y(t) > e"v(S(t)), O<t<oo. (8.2) 
We use (8.1) to prove (8.2) fort = 0, i-e., 
¥ (0) > v($(0)). 8.3) 


If ¢ is not zero, we can take ¢ to be the initial time and S(t) to be the initial stock price, and then 
adapt the argument below to prove property (8.2). 
Proof of (8.3), assuming Y is a supermartingale satisfying (8.1): 


Case I: S(0) < L*. We have 


¥(0) > (K — $(0))* = v(8(0)). 
(8.1) 


CHAPTER 25. American Options 259 


Case II: (0) > L*: For T > 0, we have 


Y (0) > EY (7 AT) (Stopped supermartingale is a supermartingale) 
> B[Y(r AT)1¢-<o0}]- (Since ¥ > 0) 


Now let 7-00 to get 


¥(0) > lim beGewe el eee.|| 
> IE he ylieass| (Fatou’s Lemma) 
> IE je" (K -S(r))T1 e003] (by 8.1) 
ee 


L* 
= v(S(0)). (See eq. 7.2) 


25.9 Perpetual American contingent claim 


Intinsic value: h(.S(t)). 
Value of the American contingent claim: 

v(x) = sup B® [eA S(r))], 
where the supremum is over all stopping times. 


Optimal exercise rule: Any stopping time 7 which attains the supremum. 


Characterization of v: 


1. e~"'v(S(t)) is a supermartingale; 
2,6 “"ol(S()) Se “A(S@));, O< £-< 60; 


3. e "'v($(t)) is the smallest process with properties | and 2. 


25.10 Perpetual American call 


v(x) = sup B® [e""($(r) = K)*] 


Theorem 10.63 


v(z)=a Va>0. 


260 


Proof: For every t, 


Let too to get v(x) > x. 
Now start with S(0) = 2 and define 


VOQSe "S@). 
Then: 


1. Y is a supermartingale (in fact, Y is a martingale); 
2. Vt) Ser (SG =A)",  0<t < o, 


Therefore, Y (0) > v(.S(0)),ie., 
au > O(a). 


Remark 25.2 No matter what 7 we choose, 
IE* [e"7 (S(r) — K)*] < E*[e"7S(r)] < 2 = v(2). 


There is no optimal exercise time. 


25.11 Put with expiration 


Expiration time: 7 > 0. 
Intrinsic value: (KK — S(t))T. 
Value of the put: 
v(t, x) = (value of the put at time ¢ if S(t) = x) 


= sup E%e"-9(K — S(r))t. 
t<r<T 
aS 


7 :stopping time 
See Fig. 25.6. It can be shown that v, vz, v, are continuous across the boundary, while v,.,, has a 
jump. 


Let 5(0) be given. Then 


CHAPTER 25. American Options 


wT, 2) = 0, 82k 


oT 2) K =a, 0a ok 


Figure 25.6: Value of put with expiration 


1. e v(t, S(t)), 0 <t < 7, is a supermartingale; 
2,.€ “yESO)) Se (Ra SE))*, OR E57; 


3. e~"'v(t, S(t)) is the smallest process with properties | and 2. 


25.12 American contingent claim with expiration 


Expiration time: 7 > 0. 
Intrinsic value: h(S(t)). 


Value of the contingent claim: 


ee ee Dy E*e"C-9h(S(r)). 


Then 


PU — Vy — FBV, — $072 Ure > 0, 


ce), 
At every point (f, 2) € [0,7] x [0, oo), either (a) or (b) is an equality. 


Characterization of v: Let 5(0) be given. Then 


261 


(a) 
(b) 
(c) 


262 
1. e~"v(t, S(t)), 0<t < T, is a supermartingale; 
2. e "u(t, S(t)) > eT A(S(E)); 


3. e~ "v(t, S(t)) is the smallest process with properties 1 and 2. 


The optimal exercise time is 
T= min{t > 0; v(t, S(t)) = h(S(t))} 


If r(w) = oo, then there is no optimal exercise time along the particular path w. 


Chapter 26 


Options on dividend-paying stocks 


26.1 American option with convex payoff function 


Theorem 1.64 Consider the stock price process 
dS(t) = r(t)S(t) dt+ o(t)S(t) dB(t), 


where r and o are processes and r(t) > 0, 0 < t < T, as. This stock pays no dividends. 
Let h(x) be a convex function of x > 0, and assume h(0) = 0. (E.g., h(x) = (w — K)*). An 
American contingent claim paying h(.S(t)) if exercised at time t does not need to be exercised 
before expiration, i.e., waiting until expiration to decide whether to exercise entails no loss of value. 


Proof: For 0 < a < 1 andz > 0, we have 


h(az) = h((1- a)04+ az) 
< (1- a)h(0) + ah(z) 
= ah(z). 


Let T be the time of expiration of the contingent claim. For 0 <t < T, 


0< ext = [oy tub <1 


and S(T’) > 0, so 


B(t) B(t) x 
h (Sam) < ery MS). (*) 


Consider a European contingent claim paying h(.S(7')) at time 7’. The value of this claim at time 
t € [0, T]is 


sRMs ro] 


264 


Figure 26.1: Convex payoff function 


Therefore, 


Str) [Fea] oye 
>—~h (50 IE sol) (Jensen’s inequality) 
) 


is a martingale) 


This shows that the value X(t) of the European contingent claim dominates the intrinsic value 
h(S(t)) of the American claim. In fact, except in degenerate cases, the inequality 


X(t)>h(S(), 0<t<T, 


is strict, i.e., the American claim should not be exercised prior to expiration. | 


26.2 Dividend paying stock 


Let r and o be constant, let 6 be a “dividend coefficient” satisfying 


0<d<1. 


CHAPTER 26. Options on dividend paying stocks 265 


Let 7’ > 0 be an expiration time, and let ¢; € (0,7) be the time of dividend payment. The stock 
price is given by 


S(t) = fae ss ean eens 0<t<h, 


(1 — 6) S(t1) exp{(r - to*)(t —t,) +0(B(t) — B())}, yy <t<T. 


Consider an American call on this stock. At times t € (t,, 7), it is not optimal to exercise, so the 
value of the call is given by the usual Black-Scholes formula 


v(t,2) =aN(d4(T -t,2)) — Ke"? -9N(d_(T-t,2)), t1<t<T, 


where 
ds(T — t,x) 


log = +(T -t)(r t0?/2)|. 


1 
off —t 


At time ¢;, immediately after payment of the dividend, the value of the call is 
v(t1, (1 — 6) S(4)). 
At time ¢;, immediately before payment of the dividend, the value of the call is 
w(t, S(t), 


where 
w(t,,2) = max {(z — K)*, v(4,(1-4)z}. 


Theorem 2.65 For 0 < t < t1, the value of the American call is w(t, S(t)), where 
w(t, 2) = Ee (t,, S(t) ] 


This function satisfies the usual Black-Scholes equation 


rw+we~etrew, 4 $072 Wee = 0, O0<t<tj,2>0, 


(where w = w(t, x)) with terminal condition 
w(t1,2) = max{(z — K)*, v(t1,(1—-4)z)}, 2 >0, 


and boundary condition 
w(t,0)=0, O0<t<T. 


The hedging portfolio is 
eS VRS ie 
At = fee 50) 
V(t, 35 (€)), 1 <t<T,. 


Proof: We only need to show that an American contingent claim with payoff w(t,, .S(t1)) at time 
t, need not be exercised before time ¢,. According to Theorem 1.64, it suffices to prove 


1. w(ty,0) = 0, 


266 


2. w(ti, 2) is convex in &. 


Since v(t,, 0) = 0, we have immediately that 
w(t1,0) = max {(0— K)t, o(t1, (1- 5)0)} = 0. 


To prove that w(t1, x) is convex in x, we need to show that v(t;, (1—4)a) is convex is «. Obviously, 
(z — K)* is convex in x, and the maximum of two convex functions is convex. The proof of the 
convexity of v(t1, (1 — 5)a) in a is left as a homework problem. a 


26.3 Hedging at time ¢, 


Let ¢ = S(t). 

Case I: v(t1, (1 — 5)x) > (a — K)?. 

The option need not be exercised at time ¢; (should not be exercised if the inequality is strict). We 
have 


where 
A(t:+) = lim A(t) 
it, 
is the number of shares of stock held by the hedge immediately after payment of the dividend. The 
post-dividend position can be achieved by reinvesting in stock the dividends received on the stock 
held in the hedge. Indeed, 


A(ti+) = Ath) SAC Ans 


1-6 
2 dA(t1) S(t) 
=O a= SG) 


dividend ived 
= # of shares held when dividend is paid + pune 
price per share when dividend is reinvested 


Case II: v(t, (1— 5)z) < («x — K)*. 

The owner of the option should exercise before the dividend payment at time f, and receive (x— IK’). 
The hedge has been constructed so the seller of the option has x — K before the dividend payment 
at time t. If the option is not exercised, its value drops from « — K to v(t,, (1 —6)2), and the seller 
of the option can pocket the difference and continue the hedge. 


Chapter 27 


Bonds, forward contracts and futures 


Let {W (t), F(£); 0 < t < T} be a Brownian motion (Wiener process) on some (Q, F, P). Con- 
sider an asset, which we call a stock, whose price satisfies 


dS(t) = r(t)S(t) dt + o(t)S(t) dwt). 


Here, r and o are adapted processes, and we have already switched to the risk-neutral measure, 
which we call JP. Assume that every martingale under P can be represented as an integral with 
respect to W. 


Define the accumulation factor 


B(t) = exp { [rw du} 


A zero-coupon bond, maturing at time 7’, pays 1 at time 7’ and nothing before time 7’. According 
to the risk-neutral pricing formula, its value at time t € [0, 7] is 


BCT) = Bt) E Eada 


Given B(t, 7) dollars at time t, one can construct a portfolio of investment in the stock and money 


267 


268 


market so that the portfolio value at time 7 is 1 almost surely. Indeed, for some process 7, 


BOT) = 80 B | sar) 


martingale 


= a) [E (say) + [oe awa] 
= stv) [B0,7)+ [yw awe), 


aB(e,7) = r() 9) [BO,7) + [rw awe] ae + awOH awe 
= r(t)B(t, T) dt + B(t)y(t) dW(t). 


The value of a portfolio satisfies 


dX (t) 


A(t) dS(t) + r(t)LX (t) — A(t) S (#)Jdt 


a) 


We set 


If, at any time ¢, X(t) = B(t, 7) and we use the portfolio A(w), t < wu < 7, then we will have 
MO SBe ys, 


If r(t) is nonrandom for all ¢, then 


B(t,T) = exp {- [ore au! , 


dB(t,T) = r(t)B(t,T) dt, 


i.e., y = 0. Then A given above is zero. If, at time t, you are given B(t, 7’) dollars and you always 
invest only in the money market, then at time 7’ you will have 


B(t, T) exp tf au! ae 


If r(t) is random for all t, then is not zero. One generally has three different instruments: the 
stock, the money market, and the zero coupon bond. Any two of them are sufficient for hedging, 
and the two which are most convenient can depend on the instrument being hedged. 


CHAPTER 27. Bonds, forward contracts and futures 269 
27.1 Forward contracts 


We continue with the set-up for zero-coupon bonds. The 7'-forward price of the stock at time 
t € [0,7] is the F(t)-measurable price, agreed upon at time t, for purchase of a share of stock at 
time 7’, chosen so the forward contract has value zero at time t. In other words, 


1 
E san (S(T) — F(t)) Fo) 25), 0S Ter 
We solve for F(t): 
1 
0=E san (S(T) — F@) Fo) 
— ¢ _ FO p [BO 
=F Ho] - a5" lan 
S(t) FO 
Bee) — eq 27): 
This implies that 
__ Ss 
= Baa 


Remark 27.1 (Value vs. Forward price) The 7’-forward price F’(t) is not the value at time t of 
the forward contract. The value of the contract at time t is zero. /’(t) is the price agreed upon at 
time t which will be paid for the stock at time 7’. 


27.2 Hedging a forward contract 


Enter a forward contract at time 0, i.e., agree to pay F(0) = ss for a share of stock at time 7’. 
At time zero, this contract has value 0. At later times, however, it does not. In fact, its value at time 
t € [0, T]is 


V0) = 3) BL a (8(7) - FO)|F OO) 
= 0) #/ FA reo] - ro # [FF re] 
ae 
BO Fy ~ FO)BUT) 


This suggests the following hedge of a short position in the forward contract. At time 0, short F’(0) 
T-maturity zero-coupon bonds. This generates income 


F(0)B(0,T) = 


B(0,T) = S(0). 


270 


Buy one share of stock. This portfolio requires no initial investment. Maintain this position until 
time 7’, when the portfolio is worth 


S(T) — F(0)B(T,T) = 8(T) — F(0). 


Deliver the share of stock and receive payment F'(0). 


A short position in the forward could also be hedged using the stock and money market, but the 
implementation of this hedge would require a term-structure model. 


27.3. Future contracts 


Future contracts are designed to remove the risk of default inherent in forward contracts. Through 
the device of marking to market, the value of the future contract is maintained at zero at all times. 
Thus, either party can close out his/her position at any time. 


Let us first consider the situation with discrete trading dates 
O=to <i <...<t, =T. 


On each [t;,¢;+41), r is constant, so 


is F (t,,)-measurable. 


Enter a future contract at time t,, taking the long position, when the future price is ®(¢;,). At time 
tx41, when the future price is ®(t,41), you receive a payment ®(t,41) — ®(t,). (If the price has 
fallen, you make the payment —(®(t,41) — ®(t,)). ) The mechanism for receiving and making 
these payments is the margin account held by the broker. 


By time T = t,,, you have received the sequence of payments 


P(ti41) — O(te), Ptet2) — O(tati), ---, Pltn) — O(tn-1) 
at times t241,¢h42,---,,. The value at time ¢ = to of this sequence is 
n-1 


B(t) E 5 (B(t41) ~ 8(4) aa) 7 


oy Bltiss 


Because it costs nothing to enter the future contract at time ¢, this expression must be zero almost 
surely. 


CHAPTER 27. Bonds, forward contracts and futures 271 
The continuous-time version of this condition is 


sw | [dow rw) =0, 0<¢t<T. 


Note that 3(t;41) appearing in the discrete-time version is ¥ (tf; )-measurable, as it should be when 
approximating a stochastic integral. 


Definition 27.1 The 7'-future price of the stock is any F (t)-adapted stochastic process 
{P(t); O<t<T}, 
satisfying 


®(7) = S(T) as., and (a) 


Toy 
e|| aay 


Theorem 3.66 The unique process satisfying (a) and (b) is 


rw) =0, 0<t<T. (b) 


&(t) = E snr) Rie are 


Proof: We first show that (b) holds if and only if ® is a martingale. If ® is a martingale, then 
le ata d®(w) is also a martingale, so 


ra Ta abla 


a F(b| 2 | peer 


(u) 


i) -E if iw d®(u) 


= 0. 


On the other hand, if (b) holds, then the martingale 


Toy 
M(t) =E / Fay 12) rw) 
satisfies 
fi roy 
MW = [ Fay HH +B | Tay 1" rw) 
7 "We (u), O0<t<T 
this implies 
dM(t) = 5H d®(t), 
10(t) = B(t) dM), 


272 


and so ® is a martingale (its differential has no dt term). 


Now define 
O(t)= KE [sin rw] , O<t<T. 


Clearly (a) is satisfied. By the tower property, ® is a martingale, so (b) is also satisfied. Indeed, this 
® is the only martingale satisfying (a). a 


27.4 Cash flow from a future contract 


With a forward contract, entered at time 0, the buyer agrees to pay F'(0) for an asset valued at S(T). 
The only payment is at time 7’. 


With a future contract, entered at time 0, the buyer receives a cash flow (which may at times be 
negative) between times 0 and 7’. If he still holds the contract at time 7’, then he pays $(7') at time 
T for an asset valued at S(7’). The cash flow received between times 0 and 7’ sums to 


T 
/ d&(u) = &(T) — (0) = S(T) — (0). 
0 
Thus, if the future contract holder takes delivery at time 7, he has paid a total of 
(®(0) — S(L)) + S(P) = 0(0) 


for an asset valued at S(T’). 


27.5 Forward-future spread 


Future price: ®(¢) = IE [s(z) Fw. 


Forward price: 


If aD and .$(7’) are uncorrelated, 


CHAPTER 27. Bonds, forward contracts and futures 273 


If aT and .S(7’) are positively correlated, then 
(0) < F(0). 


This is the case that a rise in stock price tends to occur with a fall in the interest rate. The owner 
of the future tends to receive income when the stock price rises, but invests it at a declining interest 
rate. If the stock price falls, the owner usually must make payments on the future contract. He 
withdraws from the money market to do this just as the interest rate rises. In short, the long position 
in the future is hurt by positive correlation between Ty and S(7’). The buyer of the future is 
compensated by a reduction of the future price below the forward price. 


27.6 Backwardation and contango 


Suppose 
dS(t) = wS(t) dt + a S(t) dwW(t). 


Define 9 = 4, W(t) = t+ Wit 


Z(T) = exp{-OW (T) — 40°T} 
P(A) = : Z(T) dP, VA€ F(T). 
A 
Then W is a Brownian motion under P, and 
dS(t) = rS(t) dt + oS(t) dW(t). 


We have 


The expected future spot price of the stock under /P is 


IES(T)= S(O)e IE [exp {-40°T + ow(T)}] 
= eS (0). 


274 


The future price at time 0 is 
(0) =e"? S(0). 


If . > r, then ®(0) < S(T). This situation is called normal backwardation (see Hull). If u <r, 
then 6(0) > S(T). This is called contango. 


Chapter 28 


Term-structure models 


Throughout this discussion, {W(t); 0 < ¢ < T*} is a Brownian motion on some probability space 
(Q, F,P), and {F(t); 0 <t < 7*} is the filtration generated by W. 


Suppose we are given an adapted interest rate process {r(t); 0 < t < T*}. We define the accumu- 
lation factor 


a0) =exp{ fir) aut O<t <7. 


In a term-structure model, we take the zero-coupon bonds (“zeroes”’) of various maturities to be the 
primitive assets. We assume these bonds are default-free and pay $1 at maturity. For0 << t <7 < 
T*, let 

B(t,T) = price at time t of the zero-coupon bond paying $1 at time 7’. 


Theorem 0.67 (Fundamental Theorem of Asset Pricing) A term structure model is free of arbi- 


trage if and only if there is a probability measure IP on Q (a risk-neutral measure) with the same 
probability-zero sets as IP (i.e., equivalent to IP), such that for each T € (0, 7%], the process 


Se ee eh 


is a martingale under P. 
Remark 28.1 We shall always have 
dB(t,T) = w(t, T) Bt, 7) dt+ p(t, T)B(t,T) dW(t), O<t<T, 


for some functions y(t, 7’) and p(t, 7’). Therefore 


B(t,T)\ _ ee 
a(S) = Bu. (a5) + ay BED 
Bit, T) 


= fe?) - OBE BLD) 


B(t) 


dt + p(t, T) dW (t), 


275 


276 


so JP is a risk-neutral measure if and only if ju(t, 7’), the mean rate of return of B(t, 7’) under P, is 
the interest rate r(¢). If the mean rate of return of B(t, 7) under JP is not r(t) at each time ¢ and for 


each maturity 7’, we should change to a measure /P under which the mean rate of return is r(t). If 
such a measure does not exist, then the model admits an arbitrage by trading in zero-coupon bonds. 


28.1 Computing arbitrage-free bond prices: first method 


Begin with a stochastic differential equation (SDE) 
dX (t) = a(t, X (t)) dt + b(t, X (t)) dW(t). 


The solution X(t) is the factor. If we want to have n-factors, we let W be an n-dimensional 
Brownian motion and let X be an n-dimensional process. We let the interest rate r(t) be a function 
of X(t). In the usual one-factor models, we take r(t) to be X (t) (e.g., Cox-Ingersoll-Ross, Hull- 
White). 


Now that we have an interest rate process {r(t); 0 < t < 7T*}, we define the zero-coupon bond 
prices to be 


We showed in Chapter 27 that 
dB(t,T) = r(t)B(t, 7) dt + B(t)y(t) dW (t) 


for some process y. Since B(t, 7’) has mean rate of return r(t) under JP, JP is a risk-neutral measure 
and there is no arbitrage. 


28.2 Some interest-rate dependent assets 


Coupon-paying bond: Payments P,, P,...,F,, at times 7), 7>,... ,7;,. Price at time ¢ is 


SPB GTe: 


{k:t<Ty, } 


Call option on a zero-coupon bond: Bond matures at time 7’. Option expires at time 7, < T. 
Price at time ¢ is 


(B(T,,T) — K)t 


F(a], 0<t<h,. 


CHAPTER 28. Term-structure models 277 


28.3. Terminology 


Definition 28.1 (Term-structure model) Any mathematical model which determines, at least the- 
oretically, the stochastic processes 


B(t,T), 0<t<T, 
for all T € (0, 7%]. 


Definition 28.2 (Yield to maturity) For 0 < ¢ < T < 1”, the yield to maturity Y (t,T) is the 
F (t)-measurable random-variable satisfying 


B(t,T) exp {((T — HY (t, T)} = 1, 


or equivalently, 


Determining 


is equivalent to determining 


28.4 Forward rate agreement 


LetO0<t<T < T +e < T* be given. Suppose you want to borrow $1 at time 7’ with repayment 
(plus interest) at time 7’ + ¢, at an interest rate agreed upon at time ¢. To synthesize a forward-rate 
agreement to do this, at time ¢ buy a 7’-maturity zero and short oe (T + e€)-maturity zeroes. 


The value of this portfolio at time ¢ is 


B(t, T) 


BUG Es BOT +6 


Bt, T +) =0. 


At time 7’, you receive $1 from the 7-maturity zero. At time 7’ + €, you pay $ a The 


effective interest rate on the dollar you receive at time T is R(t, 7,7 + €) given by 


Bit, T) 
a t,7,7 
BET y = OPC RET +O), 
or equivalently, 
log B(t, T — log BY, T 
RUT. TP +.0 = EB P+9 — log BUT) 
€ 
The forward rate is 
fé,7T) =lim RO, T,T+ 6) =~ log BUT). (4.1) 
€0 


278 


This is the instantaneous interest rate, agreed upon at time t, for money borrowed at time 7’. 


Integrating the above equation, we obtain 


T T @ 
| fhe) d= -{ 9g, os Bw) du 
u=T 
= — log B(t, u) 


u=t 


— log B(t,T), 


SO 


Bt, T) = exp {= [iw tu}. 


You can agree at time ¢ to receive interest rate f(t, w) at each time wu € [t, 7’]. If you invest $ B(t, 7) 
at time ¢ and receive interest rate f(t, wu) at each time u between ¢ and 7’, this will grow to 


Bit, T) exp{ [Miu au) = 


at time 7’. 


28.5 Recovering the interest 7 (t) from the forward rate 


2 BUT) _7e [-ro 0] = -r(t) 
On the other hand, 
T 
Bt, T) = xp{- | f(t, u) au! ; 
BLT) Sa Ge) xp{- [sen au} . 
a 
sPBD) =F. 


Conclusion: r(¢) = f(t, ¢). 


CHAPTER 28. Term-structure models 279 


28.6 Computing arbitrage-free bond prices: Heath-Jarrow-Morton 
method 


For each T’ € (0, 7], let the forward rate be given by 


ALT) = 10,0)+ fa(u7) aut [ out) dW(u), 0<t<T. 


Here {a(u, 7); 0 <u < T} and {o(u, 1); 0 < u < T} are adapted processes. 
In other words, 
df(t,T) = a(t,T) dt+ o(t,T) dW(é). 


BE = xp{- ff f(t, u) tut. 


Recall that 


T 
=i) a— | [a(t, u) dt +a(t,u) dW(t)] du 


A aie [ata a dt — [owe a dW (t) 


a*(t,T) o*(t,T) 
= r(t) dt — a*(t,T) dt — o*(t, T) dW(t). 


Let 
g(x) =e", g'(z) =e", g" (2) =e” 
Then 
i’ 
B(t,T) = 9 (- | flt,u) in) . 
and 


=a (- [5 - r dt — o* dt — o* dW) 
+ 


1" (- ‘ 0) du) (or 
io" (- [se 
t 
1 ( 
3 ( 


= BUt,T) |r(t) — a(t, 7) + 
—o*(t,T)B(t,T) dW(t). 


arr 


280 
28.7 Checking for absence of arbitrage 


JP is arisk-neutral measure if and only if 


T ip : 
i a(t, u) du = 4 / guj dw), Ota T <7". 
t t 


Differentiating this w.r.t. 7’, we obtain 


T 
a(t,T) = o(t,T) f o(fju)du, 0<t<T<T*. 
t 


Not only does (7.1) imply (7.2), (7.2) also implies (7.1). This will be a homework problem. 


(7.1) 


(7.2) 


Suppose (7.1) does not hold. Then /P is not a risk-neutral measure, but there might still be a risk- 


neutral measure. Let {@(¢); 0 < t < T'*} be an adapted process, and define 


Then 

dB(t,T) = B(t, T) |r(t) — a* (t, T) + 4(o*(t,T))"] dt 
)B(t,T) dW (t) 
r(t) — a*(t,T) + 4(o*(t,T))? + o*(t, T)0(t)| dt 
—o*(t,T)B(t,T) dW(t), O0<t<T. 


In order for B(t, T’) to have mean rate of return r(t) under IP, we must have 
aX (t,T) = 4(o7(t,T))? + o7(t, T(t), O<t<T<T*. 
Differentiation w.r.t. 7’ yields the equivalent condition 


a(t,T) =o(t,T)o*(t,T) + o(t, T)Ot), O<t<T<T". 


(7.3) 


(7.4) 


Theorem 7.68 (Heath-Jarrow-Morton) For each T € (0,7, let a(u,T), 0 < u < T, and 
o(u,T),0 < uw < T, be adapted processes, and assume o(u,T) > 0 for all u and T. Let 


f(0,71), 0< t < T*, be a deterministic function, and define 


ALT) = 10,7) + fa(u,7) aut [ out) dW (u). 


CHAPTER 28. Term-structure models 281 


Then f(t,T), 0 <t < T < T* isa family of forward rate processes for a term-structure model 
without arbitrage if and only if there is an adapted process 0(t), 0 < t < T%, satisfying (7.3), or 
equivalently, satisfying (7.4). 


Remark 28.2 Under /P, the zero-coupon bond with maturity 7’ has mean rate of return 
r(t) —a*(t,T) + 3(0°@,T))’ 
and volatility o*(t, 7’). The excess mean rate of return, above the interest rate, is 
—oa*(t,T)+ $(o*(t, Py, 
and when normalized by the volatility, this becomes the market price of risk 


—o*(¢,T) + 30° T))’ 
o*(t, T) 


The no-arbitrage condition is that this market price of risk at time ¢ does not depend on the maturity 


T of the bond. We can then set 


ay = [renderer 


and (7.3) is satisfied. 
(The remainder of this chapter was taught Mar 21) 


Suppose the market price of risk does not depend on the maturity 7’, so we can solve (7.3) for @. 
Plugging this into the stochastic differential equation for B(t, 7’), we obtain for every maturity 7’: 


dB(t,T) = r(t)B(t,T) dt — o*(t, T)B(t, T) dW(t). 


Because (7.4) is equivalent to (7.3), we may plug (7.4) into the stochastic differential equation for 
f(t, 7) to obtain, for every maturity 7: 


df (t,T) = [o(t, T)o*(t,T) + o(t, T)O(t)] dt + o(t,T) dW(t) 
= o(t, T)o*(t,T) dt + o(t,T) dW(t). 


28.8 Implementation of the Heath-Jarrow-Morton model 


Choose 


282 


These may be stochastic processes, but are usually taken to be deterministic functions. Define 


a(t, T) = o(t, T)o*(t,T) + o(t, T)9(t), 


Let f(0,7), 0 < T < 7%, be determined by the market; recall from equation (4.1): 


f(0,T) = —- log BOT): 0S TET. 


Then f(t, 7’) for0 < t < T is determined by the equation 
df (t,T) = o(t, T)o*(t,T) dt + o(t,T) dW(t), (8.1) 
this determines the interest rate process 
riVvS fi.t)y URt<T*, (8.2) 


and then the zero-coupon bond prices are determined by the initial conditions B(0,7), 0 < T < 
T*, gotten from the market, combined with the stochastic differential equation 


dB(t,T) =r(t)B(t,T) dt — o*(t, T)B(t, T) dW(t). (8.3) 


Because all pricing of interest rate dependent assets will be done under the risk-neutral measure P, 
under which W is a Brownian motion, we have written (8.1) and (8.3) in terms of W rather than 
W. Written this way, it is apparent that neither @(t) nor a(t, 7’) will enter subsequent computations. 
The only process which matters is a(t, 7’), 0 < t < T’ < T™, and the process 


T 
o-,)= i o(i,a)du,. VXtR PT, (8.4) 
t 


obtained from o(t, 7’). 
From (8.3) we see that o*(t, 7’) is the volatility at time t of the zero coupon bond maturing at time 
T’. Equation (8.4) implies 

OE TIS 0p at a i, (8.5) 
This is because B(7’, 7) = 1 and so as ¢ approaches 7’ (from below), the volatility in B(t, 7’) must 
vanish. 


In conclusion, to implement the HJM model, it suffices to have the initial market data B(0,7T), 0 < 
T <7™*, and the volatilities 
OT); UStRT St". 


CHAPTER 28. Term-structure models 283 
We require that o*(t, 7’) be differentiable in 7’ and satisfy (8.5). We can then define 


o(,T)= Sot), 


and (8.4) will be satisfied because 


T 
o*(t,T)=o°(tT)-or(t,) =f Sor(w) du 
t U 


We then let W be a Brownian motion under a probability measure IP, and we let B (if), U<te 
T < 1%, be given by (8.3), where r(t) is given by (8.2) and f(t, 7’) by (8.1). In (8.1) we use the 
initial conditions 


a 
log B(0,T), 0<T<T". 


f(0,7) = OT 


Remark 28.3 It is customary in the literature to write W rather than W and IP rather than P, 
so that IP is the symbol used for the risk-neutral measure and no reference is ever made to the 
market measure. The only parameter which must be estimated from the market is the bond volatility 
o*(t, 1), and volatility is unaffected by the change of measure. 


284 


Chapter 29 


Gaussian processes 


Definition 29.1 (Gaussian Process) A Gaussian process X(t), t > 0, is a stochastic process with 
the property that for every set of times 0 < ¢) < tg < ...< t,, the set of random variables 


X (bi )yX (Co) e225 0 (ty) 
is jointly normally distributed. 
Remark 29.1 If X is a Gaussian process, then its distribution is determined by its mean function 
m(t) = JEX (t) 
and its covariance function 
p(s,t) = IE[(X(s) — m(s)) - (X@ — m(t))]). 

Indeed, the joint density of X (t1),..., X (t,) is 

IP{X (ty) € oe gk (En) day} 


= RVI det © exp {4 pues (x — m(t))7} dx1... dX, 


where » is the covariance matrix 


plti,ti) plti,t2) ...  plti,tn) 
Sea plte,ti) plte,t2) ... pltr,tn) 
p(tn,t1) p(tn, ta) si P(tn; tn) 
x is the row vector [2 1, %2,... , Zp], t is the row vector [t1, f2,... ,tn],and m(t) = [m(t1), m(t2),... 


The moment generating function is 


Bo uxt} = exp {u-m(t)™ + su: ‘ ut}, 


k=1 


where u = [t1, Ug,--- , Un]. 


285 


286 
29.1 Anexample: Brownian Motion 


Brownian motion W is a Gaussian process with m(t) = 0 and p(s,t) = s At. Indeed, if0 < s < t, 
then 


p(s,t) = IB [W(s)W(t)] = [W(s) (W(t) — W(s)) + W2(s)| 
= IEW (s) JE (W(t) — W(s)) + EW?(s) 
= EW?(s) 
5 Ab: 


To prove that a process is Gaussian, one must show that X (¢1),..., X (t,) has either a density or a 
moment generating function of the appropriate form. We shall use the m.g.f., and shall cheat a bit 
by considering only two times, which we usually call s and t. We will want to show that 


IE exp {u1X (s) + X(t) } = exp fem + ugm + $[ur ue] i Zi fA 


O21 O22) | U2 


Theorem 1.69 (Integral w.r.t. a Brownian) Let W(t) be a Brownian motion and 6(t) a nonran- 
dom function. Then 


t 
xX()= | §(u) dW(u) 
0 
is a Gaussian process with m(t) = 0 and 


p(s,t) = i. 6° (u) du. 


Proof: (Sketch.) We have 


dX = 6 dW. 
Therefore, 
det*(s) — ue’ (9) §(s) dW (s) + 4y2 er (9) §2(5) ds, 
HX) = MXO 4 y i “eX 5(v) dW(v) hu? i “eX (0) do, 
Eet*(s) =] + sue is §°(v) Be'*&) dv, 
£ Bex = 1y252(s) Ber*(, 


BetX(s) — exX(0) exp { 0" [ew av} (1.1) 
0 


= exp {$u? f° 5%(u dv} 


This shows that X (s) is normal with mean 0 and variance {7 5?(v) dv. 


CHAPTER 29. Gaussian processes 287 


Now let 0 < s < t be given. Just as before, 
de"X) = ye"*X § (4) dW(t) + 4u2e%X (52 (t) dt. 
Integrate from s to t to get 
t t 
ger) eee ls) a uf 5(vjeX*™ dW(v) + bu? | 5? (v)e"X™) du, 
Take JE. . .|#(s)] conditional expectations and use the martingale property 


E | ‘| * 3(v)e"X®) dW (v) Fis) -E | [ * §(v)e"X) dW (v) Fis) - i * §(v)e"X dW (v) 


s 


=0 
to get 
t 
E jer Fis) = eX (s) 4 su? f Pv) IE jer Fs) dv 
d 
Gk [exXOF9] =r ME [lA], e2s 
The solution to this ordinary differential equation with initial time s is 
t 
IE enn Fis) = etX(s) exp fu? | 5°(v) dv} t2s. (1.2) 


We now compute the m.g.f. for (X (s), X (#)), where 0 < 5s < t: 
Fis) 
t 
- eltu2)X(s) ex {4u3 | 67(v) av} , 


E [ets ¥(e)40 (0) _E {IE eee 


t 
= {el X6)} exp { 40 | 5?(v) av} 
t 


E eens 


F(=)| =n * Oe [enel0 


= exp {43 + 2uyu2) | 6°(v) dv 


+ 
= a So 5° So 5 Uy 
exp {tn uz] ee fee us| {- 

This shows that (X (s), X (£)) is jointly normal with JE.X (s) = X(t) = 0, 


EX(s) = [%) dv,  EX?(t) = f Pe) ds 


288 


Remark 29.2 The hard part of the above argument, and the reason we use moment generating 
functions, is to prove the normality. The computation of means and variances does not require the 
use of moment generating functions. Indeed, 


X()= [ow dW (u) 


is a martingale and X (0) = 0, so 


For fixed s > 0, 


Therefore, 


If 5 were a stochastic proess, the It6 isometry says 
EX?(s) = i "IE? (v) dv 
and the same argument used above shows that for 0 < s < ft, 
ElX(s)X()] = BX2(s) = [ E8 (wv) dv. 


However, when 6 is stochastic, X is not necessarily a Gaussian process, so its distribution is not 
determined from its mean and covariance functions. 


Remark 29.3 When 6 is nonrandom, 


X()= [ow dW (u) 


is also Markov. We proved this before, but note again that the Markov property follows immediately 
from (1.2). The equation (1.2) says that conditioned on F(s), the distribution of X (t) depends only 
on X(s); in fact, X(t) is normal with mean X (s) and variance {! 5?(v) dv. 


CHAPTER 29. Gaussian processes 289 


(b) 


(c) 


Figure 29.1: Range of values of y, z, v for the integrals in the proof of Theorem 1.70. 


Theorem 1.70 Let W(t) be a Brownian motion, and let 5(t) and h(t) be nonrandom functions. 
Define 


Then Y is a Gaussian process with mean function my (t) = 0 and covariance function 


pr(s.t)= fw ([ rw) ay) ([ aw) ay) dv. (1.3) 


Proof: (Partial) Computation of py (s,t): Let 0 < s < t be given. It is shown in a homework 
problem that (Y (s), Y (£)) is a jointly normal pair of random variables. Here we observe that 


and we verify that (1.3) holds. 


290 
We have 
pr (sit) = BIV()¥ 
-E [Lf awxe) ay. | h(z)X (2) a:| 
( 


s t 
-E[ fp 
0 40 


y)h(z)X (y)X (2) dy dz 


= [%) [fron 2) dy dz) dv 
( 


Remark 29.4 Unlike the process X(t) = fj 5(u) dW(u), the process Y(t) = fi X(u) du is 


CHAPTER 29. Gaussian processes 291 


neither Markov nor a martingale. For 0 < s < t, 


EY ()|F(s)] = i h(u)X (u) du+ EB Fe NC eee: Fis) 
sary i * h(u)ELX (u) |F(s)] du 
Saat [ reoxts ii 


where we have used the fact that X is a martingale. The conditional expectation [Y (t)|F(s)] is 
not equal to Y (s), nor is it a function of Y (s) alone. 


292 


Chapter 30 


Hull and White model 


Consider 
dr(t) = (a(t) — B(t)r@)) dt + a(t) dW), 
where a(t), 3(t) and o(t) are nonrandom functions of t. 


We can solve the stochastic differential equation. Set 
i 
Ke yi Ba ai: 
0 
Then 


d (eR r(t)) = eX) (seri) dt + ar(t)) 
Integrating, we get 


Te) 


r(t) = e“*® [ro + i e&Ma(u) du+ iE eKMa(u) aw(u)| : 
From Theorem 1.69 in Chapter 29, we see that r(t) is a Gaussian process with mean function 
m,(t) = e7*® [r(0 + [ e*Ma(u) dul (0.1) 
and covariance function 
pe(syt) =e KOKO f PK G2) du, (0.2) 
0 
The process r(t) is also Markov. 


293 


294 


We want to study ie r(t) dt. To do this, we define 


X(t) = i, eK We(u) dW(u), ¥(T) = i © e-KOX (t) di. 
Then 


r(t) = e“*® [r(o + [ e*Ma(u) see CNG), 


[rw w= [ eK) [ro + fer a(u) du| dt+Y¥(T). 


According to Theorem 1.70 in Chapter 29, ie r(t) dt is normal. Its mean is 


wf r( a= fe e“Ktt [ro )+ fer a(u) du| dt, (0.3) 


and its variance is 


T 
var (/ r(t) a) = EY? (T) 


oe gS > = ‘ 
= 2K) 42 (y) (/ e~K(y) iy) dv. 
0 v 


The price at time 0 of a zero-coupon bond paying $1 at time T is 


B(0,T) = exp {- fr a| 


= exp{—r(0)C(0,T) — A(0, T)}, 


CHAPTER 30. Hull and White model 295 


Uu 
Figure 30.1: Range of values of u,t for the integral. 
30.1 Fiddling with the formulas 


Note that (see Fig 30.1) 


B(0,T) = exp {-r(0)C (0, 7) — A(0, T)}. 
Consider the price at time ¢ € [0, 7] of the zero-coupon bond: 


BUT) =e eso {= [rea au! rw) ; 


Because r is a Markov process, this should be random only through a dependence on r(t). In fact, 


Bt, T) =exp{-r()C(t, T) — A(t, T)}, 


296 


where 


T a T Ja a T a 
A(t, T) = Mav) (/ e“K(y) iy) — Leek (%) 52 (y) (/ e“K(y) iy) dv, 
t v v 
a T a 
CET) = cKO [KW dy, 
t 


The reason for these changes is the following. We are now taking the initial time to be ¢ rather than 
zero, So it is plausible that ie ... dv should be replaced by i ... dv. Recall that 


K(v) = [lw de 
and this should be replaced by 
K(v) — K(t) = / “Bid 
Similarly, A’ (y) should be replaced by K’(y) — A(t). Making these replacements in A(0, 7’), we 


see that the 4 (¢) terms cancel. In C’(0, 7’), however, the A’ (¢) term does not cancel. 


30.2 Dynamics of the bond price 


Let C(t, 7’) and A;(t, 7’) denote the partial derivatives with respect to t. From the formula 
B(t,T) = exp {-r(t)C(t,T) — A(t, T)}, 
we have 
dB(t,T) = B(t,T) |-C(t,T) dr(t) — $07(t, 7) dr(t) dr(t) — r(t)Ci(t,T) dt — Ar(t,T) at] 
= BUt,T) | ~ C(t, T) (a(t) — Blt)r(8) dt 
—~ C(t, T)o(t) dW(t) — $C? (t, T)o?(t) dt 


— r(t)Ci(t,P) dt — Ay(t, T) a 


Because we have used the risk-neutral pricing formula 


BUT HE eso {- [reo au} rw) 


to obtain the bond price, its differential must be of the form 


dB(t,T) = r(t)B(t,T) dt +(...) dw(t). 


CHAPTER 30. Hull and White model 297 


Therefore, we must have 
—C(t, T) (a(t) — B(t)r(t)) — $C? (t, Tyo? (t) — r()Ci(t, T) — Ac(t,T) = v(t). 


We leave the verification of this equation to the homework. After this verification, we have the 
formula 


dB(t,T) =r(t)B(t,T) dt — o(t)C(t,T) B(t, T) dwt). 


In particular, the volatility of the bond price is o(t)C(t, T). 


30.3 Calibration of the Hull & White model 
Recall: 


dr(t) = (a(t) — B(t)r(t) dé + of) dB), 


: T : 
C(t, T) = KO | KO) dy, 
t 


Bit, T) = exp {-r(t)C(t, T) — AG, T)}. 


Suppose we obtain B(0, 7) for all 7 € [0, 7*] from market data (with some interpolation). Can we 
determine the functions a(t), 3(#), and o(t) for all t € [0, 7*]? Not quite. Here is what we can do. 


We take the following input data for the calibration: 


i. BOT), 0x TT; 


4. o(t), 0 <t < T* (usually assumed to be constant); 


5. o(0)C (0,7), 0< T < T*,ie., the volatility at time zero of bonds of all maturities. 


Step 1. From 4 and 5 we solve for 


298 


We can then compute 


gy _ -K(T) 

gpl T) =e€E 

= K(T) = -—log 2600.7), 
0 


: OF f= 
ark) =a f Bw) du = (0). 


We now have /3(7’) for all T’ € [0, 7*]. 
Step 2. From the formula 


B(0,T) = exp{=r(0)C(0,T) — A(0,7)}, 
we can solve for A(0, 7’) for all T € [0, 7’*]. Recall that 


T a T a a T a 
A(0,T) = | eK Ma(v) (/ e~K() iy) — 4e°h ()g(y) (/ eK) iy) dv. 
0 v v 


We can use this formula to determine a(7'), 0 < T < T* as follows: 


do | Kr) 9 J- K(T) ai 2K (v)2/,.\) .-K(T) 
aT ap iF) Se" aT) pe o*(v)e dv, 
cio 0 
K(T) K(T — 2K (T) iE 2K(v) 2 
€ aT srA(0.0) € a(T) / € o”(v) dv, 
Oo [| xr) 2 [xr 2 ] tip .2K(T) 2K(T) __,2K(T)2 : 
—— — ——— —s. = < < . 
ar |e ar |e api, F) a(T)e +2a(T)G(T)e € o(T), O0<T<T 


This gives us an ordinary differential equation for q, i.e., 
al (t)e?* +4 2e(t) 8 (t)e24 — e? Mo? (4) = known function of t. 


From assumption 4 and step 1, we know all the coefficients in this equation. From assumption 3, 
we have the initial condition a(0). We can solve the equation numerically to determine the function 
a bate", 


Remark 30.1 The derivation of the ordinary differential equation for a(t) requires three differ- 
entiations. Differentiation is an unstable procedure, i.e., functions which are close can have very 
different derivatives. Consider, for example, 


f(z) =0 Vee R, 
_ sin(10002) 


(a= TT Va EUR. 


CHAPTER 30. Hull and White model 


Then 


but because 
g'(z) = 10cos(1000z), 
we have 


|f'(x) — g'(x)| = 10 


for many values of 2. 


299 


Assumption 5 for the calibration was that we know the volatility at time zero of bonds of all maturi- 
ties. These volatilities can be implied by the prices of options on bonds. We consider now how the 


model prices options. 


30.4 Option on a bond 


Consider a European call option on a zero-coupon bond with strike price K and expiration time 7. 


The bond matures at time 72 > 7. The price of the option at time 0 is 
Ee 
E jen 15 OO K)*| 
= Fie r(u) du 7\+ 
— Ie 0 (exp{—r(T1)C(Th, T2) = A(T), T2)} = Kk) 7 
+ 


= a I. e* (exp{—ye(ni, 2) — A(T,,T2)} - K) f(x,y) dx dy, 


where f(a, y) is the joint density of Whe r(u) du, r(T1)). 


We observed at the beginning of this Chapter (equation (0.3)) that i r(a) du is normal with 


Hy SE iL r(u) a = [ IEr(u) du 


0 


We also observed (equation (0.1)) that r(7,) is normal with 


ple = IEr(T)) = r(O0)e"*) + ay eM a(u) du, 


0 


. Ti é 
a = var (r(71)) = ere oe e2h (4) 2 (a) du. 
0 


300 


In fact, Os r(u) du, r(T1)) is jointly normal, and the covariance is 


Ti 
Pe i (r(u) — Er(u)) du. (e(Ty) — Er(T,)) 


= i IE[(r(u) — IEr(u)) (r(T1) — Er(1,))] du 


Ti 
= [plu Ts) de, 
0 


where p,(u, 71) is defined in Equation 0.2. 


The option on the bond has price at time zero of 


[ [et (ext-vem. ty) - i.) - K) 


—0CO 4-00 


1 1 a pry y? 
+ exp 4 —-—— | 5 + +> dx dy. (4.1 
2ro02\V/1 — p? P| 2(1— p?) E O10, 08 Yee) 
The price of the option at time t € [0, 7] is 


TY 
Biles (OOH Tyee) 


F(b| 


=E jen fet rw) du (ExpisrTiCs hy A tS Kt 


F()| (4.2) 


Because of the Markov property, this is random only through a dependence on r(t). To compute 
this option price, we need the joint distribution of ( i r(u) du, r(T1)) conditioned on r(t). This 


CHAPTER 30. Hull and White model 301 


pair of random variables has a jointly normal conditional distribution, and 


= r(t)e K)+KO) eb ee) [ 


Fa 


Q 
bo bo 
—— 
oy 
Cs 
lI 


El) = mor FO 


= ener f © 2K) 92a) du, 
t 


Ti 
aiorioat) = |( {" r(a) du— ante) Ut) vate} 


= i eee) PR) g2(y) dv du. 
t t 


The variances and covariances are not random. The means are random through a dependence on 
r(t). 
Advantages of the Hull & White model: 


1. Leads to closed-form pricing formulas. 
2. Allows calibration to fit initial yield curve exactly. 
Short-comings of the Hull & White model: 


1. One-factor, so only allows parallel shifts of the yield curve, L.e., 
B(t,T) = exp {-r(C(t,T) - A(t, T)}, 
so bond prices of all maturities are perfectly correlated. 


2. Interest rate is normally distributed, and hence can take negative values. Consequently, the 


bond price 
4 
BT)=E eso {-/ r(u) au} rw) 


can exceed 1. 


302 


Chapter 31 


Cox-Ingersoll-Ross model 


In the Hull & White model, r(t) is a Gaussian process. Since, for each t, r(t) is normally distributed, 
there is a positive probability that r(¢) < 0. The Cox-Ingersoll-Ross model is the simplest one which 
avoids negative interest rates. 


We begin with a d-dimensional Brownian motion (W 1, W2,...,Wa). Let > 0 ando > 0 be 
constants. For 7 = 1,...,d, let X;(0) € JR be given so that 


XP (0) + XZ (0) +... +.XG(0) > 9, 
and let X; be the solution to the stochastic differential equation 
dX;(t) = —48X,(1) dt + bo AW; (t), 


X; is called the Orstein-Uhlenbeck process. It always has a drift toward the origin. The solution to 
this stochastic differential equation is 


1 é 4 
X,(t) = e- 2" X50) 4 to | exe aW,(u)| . 
This solution is a Gaussian process with mean function 
+64 
m;(t) = 7 2°*X;(0) 


and covariance function 


Define 
r(t) 2 X2(t) + X20) +... 4 X20). 


If d = 1, we have r(t) = X?(t) and for each t, P{r(t) > 0} = 1, but (see Fig. 31.1) 


IP {There are infinitely many values of t > 0 for which r(t) = of wl 


303 


304 


r(t) = X(t) 


(X (0, % ()) 


— ; 


Figure 31.1: r(t) can be zero. 


a 


If d > 2, (see Fig. 31.1) 
IP{ There is at least one value of t > 0 for which r(t) = 0} = 0. 
Let f(v1,22,...,04) =v? +234...4+ 23. Then 


Dies 
Sr; = 24;, feja, = ots : 
0 Has 7: 


It6’s formula implies 


d d 
4=1 


4=1 


d d 
= )7 2X; (40%: dt + fo aWilt)) +0 50? aw: ay, 


= —{r(t) dt + ey % dW; + a dt 
da? X; 
7 (= = sn) dt + oD a dw;(t). 


Define 


CHAPTER 31. Cox-Ingersoll-Ross model 305 


Then W is a martingale, 


d X; 
= dW;, 


d xX2 
dW dW =~ — dt = dt, 
r 


al 


so W is a Brownian motion. We have 


do? 


a(t) = (= — wn) dt + o,/r(t) dW(t). 
The Cox-Ingersoll-Ross (CIR) process is given by 


dr(t) = (a — fr(t)) dt + oy/r(®) a(t), 


We define 
4a 


d= > 0. 


If d happens to be an integer, then we have the representation 


but we do not require d to be an integer. If d < 2 (Le., a < 40), then 
JP{There are infinitely many values of t > 0 for which r(t) = 0} = 1. 


This is not a good parameter choice. 


If d > 2 (ie. a > $07), then 
IP{ There is at least one value of t > 0 for which r(t) = 0} = 0. 


With the CIR process, one can derive formulas under the assumption that d = 4% is a positive 


integer, and they are still correct even when d is not an integer. 


For example, here is the distribution of r(t) for fixed t > 0. Let r(0) > 0 be given. Take 
X1(0) =0, X2(0) =0, ..., Xg_1(0) = 0, X4(0) = \/r(0). 


For? = 1,2,...,d — 1, X;(t) is normal with mean zero and variance 


o2 
p(t,t) = wg" eo) 


306 


X4q(t) is normal with mean 


and variance p(t, t). Then 


d-1 2 
X;(t 
(v= ptad ( ae + X20) 0.1) 
iS Normal squared and independent of the other 
Chi-square with d — 1 = dang” degreesof term 
freedom 
Thus r(t) has a non-central chi-square distribution. 
31.1 Equilibrium distribution of r(t) 
As t00, mg(t)—+0. We have 
Xi)" 
r(t) = plt,t) ( , 
x Vet t) 
As t—+00, we have p(t, t) = x. and so the limiting distribution of r(¢) is r times a chi-square 


with d = ss degrees of freedom. The chi-square density with rad degrees of freedom is 


1 2a-0? 


f(y) 


We make the change of variable r = ry. The limiting density for r(t) is 


_g 
_ 46 1 18, 28, 
A= peer) (Br) ee 


Qo 
We computed the mean and variance of r(¢) in Section 15.7. 


31.2 Kolmogorov forward equation 


Consider a Markov process governed by the stochastic differential equation 


dX (t) = b(X(t)) dt + o(X(t)) dW(t). 


CHAPTER 31. Cox-Ingersoll-Ross model 307 


Figure 31.2: The function h(y) 


Because we are going to apply the following analysis to the case X(t) = r(t), we assume that 
X(t) > 0 for all t. 

We start at X(0) = x > 0 at time 0. Then X(t) is random with density p(0,¢, 2, y) (in the y 
variable). Since 0 and x will not change during the following, we omit them and write p(t, y) rather 
than p(0,t, 2, y). We have 


EMX(O) = [ htydelt,y) ay 


for any function h. 


The Kolmogorov forward equation (KFE) is a partial differential equation in the “forward” variables 
t and y. We derive it below. 


Let h(y) be a smooth function of y > 0 which vanishes near y = 0 and for all large values of y (see 
Fig. 31.2). It6’s formula implies 


dh(X(t)) = [h'(X ())O(X()) + $h"(X(H)o?(X(H)] dt + A(X O)o(XO) awd, 


So 


h(X (8) = A(X (0)) + [ * [W(X (s))0(X(8)) + 4h"(X(s))o*(X(8))] ds + 
[Hexen ocxtsy aw), 


HEH(X (0) = WX (O)) +18 f [WX (s))CX(8)) dt + LHX (s))o°(X(s))] as 


308 


or equivalently, 


[ree y) dy = xo)+ f fw hi'ty (s,y) dy ds + 


1 / [nro wets.y) dy a 


Differentiate with respect to ¢ to get 


[renee ay = [or enoupnte.y) dy + § f° bho? wot.) de 
Integration by parts yields 
[- epoenvtt,) ay = hyp oo)rlt) 


=0 


i. Bi Ghe Gina dy Gye nla) 


Therefore, 


[rede dy == FP rer Worry dy+4 [HODES (@dvlty)) a 


or equivalently, 
a 19 2 
[Fd Jpn) + Clu) ~ ty 5 (welt) | ay =o. 
This last equation holds for every function / of the form in Figure 31.2. It implies that 


pr(t,y) + & ((O(y) P(E, y)) — we (o7(y)plt,y)) = 0. (KFE) 


If there were a place where (KFE) did not hold, then we could take h(y) > 0 at that and nearby 
points, but take / to be zero elsewhere, and we would obtain 


2 


[os + Ze Foto) pee. 


CHAPTER 31. Cox-Ingersoll-Ross model 309 


If the process X (t) has an equilibrium density, it will be 
p(y) = lim ptt, y). 
In order for this limit to exist, we must have 
0= im pal, y). 
Letting too in (KFE), we obtain the equilibrium Kolmogorov forward equation 


5, Huw) = $5 (o7wpw)) = 


When an equilibrium density exists, it is the unique solution to this equation satisfying 


ply) 20 Wy > 0, 


[ow dy = 1. 


31.3. Cox-Ingersoll-Ross equilibrium density 


We computed this to be 


We compute 


2a — 0? ) 
y(n) = SS 0) 
= = (a- $0? — r) p(r), 
2 9 ; 
p'(r) => oe (a = to° oe" Br) p(r) + 52, (Ppt) + ae (a =— to° _ Br) p'(r) 
yy i . 
== ( 7 (a 50° — Br) -B 4 (a= $0? ~ 5r)*) p(r) 


We want to verify the equilibrium Kolmogorov forward equation for the CIR process: 


2 ((a — Br)p(r)) - LF erp(r)) =0. (EKFE) 


The LHS of (EKFE) becomes 


—Bp(r) + (a — Br)p!(r) — o*p!'(r) — So? rp" (r) 


a v(r)| B+ (a Br - 0°) (a - 30° — Br) 

“(a 40° Br) | B (a _ 1g? _ ar) 
aE) (a — 30° — Br) =-( Lo? — Gr) 

- $0? (a — $0? — Br) 

“(a 30° — Br) (a — $07 — 6r)?| 


as expected. 


31.4 Bond prices in the CIR model 


The interest rate process r(t) is given by 
dr(t) = (a — Br(t)) dt + o,/r(t) dW(t), 


where r(0) is given. The bond price process is 


Bt,T) = TE eso {— [rea au} rw) , 
exp {= [rw du} Bt.) Sub eso {— [rea au! Fo ; 


the tower property implies that this is a martingale. The Markov property implies that B(t, 7’) is 
random only through a dependence on r(t). Thus, there is a function B(r,¢, 7) of the three dummy 
variables r,¢, 7 such that the process B(t, T’) is the function B(r,t, T) evaluated at r(t), t, T, ie. 


Because 


B(t,T) = B(r(t),t,T). 


CHAPTER 31. Cox-Ingersoll-Ross model 311 


Because exp {- for(u) du} B(r(t),t, 7) is a martingale, its differential has no dé term. We com- 
pute 


The expression in [. . .] equals 


= -rBdt+B,(a— Br) dt+ B,oV/r dW 
+ $Bppo?r dt + B, dt. 


Setting the dt term to zero, we obtain the partial differential equation 
— rB(r,t,T) + By(r,t,T) + (a — Br) B,(r,t,T) + 30° Brp(r,t,T) = 0, 
0<t<T, r>0. 4D 


The terminal condition is 
BETH 1,. #0, 


Surprisingly, this equation has a closed form solution. Using the Hull & White model as a guide, 
we look for a solution of the form 


Br, ft, T) = eo) = at) 


where C'(T, T) = 0, A(T, T) = 0. Then we have 


Bi 
B, 


(—rCy ba A,)B, 
268. Beare he 


and the partial differential equation becomes 


0=-rB+(-rC;— A,)B-(a—- Br)CB + $0°rC’B 
= rB(-1-C,+ 8C + 40°C”) — B(A, +. aC) 


We first solve the ordinary differential equation 
-1-C;,(t,7) + 6C(t,T) + $0°C7(t,T) =0;  C(T,T)=0, 
and then set 


A(t, T) = af clu T) da, 


t 


312 
so A(T’, 7) = 0 and 
A(t, T) = -aC(t,T). 
It is tedious but straightforward to check that the solutions are given by 
sinh(y(T — t)) 
yeosh(y(T — t)) + oA sinh(y(T — t))’ 
vyer(F-t) 
ycosh(y(T — t)) + 98 sinh(y(T — t)) 


a 44/3? + 207, sinh u = ——. cosh u = a 


Thus in the CIR model, we have 


CiT)= 


2 
A(t, P) = ——> log 
oO 


’ 


where 


T 
E eso {-/ (1) au) re) = B(r(d),t, 7), 
t 
where 
B(r,t,T) = exp{-rC(t,T)- AL T)}, 0<t<T, r>0, 
and C’(t, 7’) and A(t, 7’) are given by the formulas above. Because the coefficients in 
dr(t) = (a — Br(t)) dt + a,/r(t) dW (t) 


do not depend on t, the function B(r,t, 7’) depends on ¢ and 7 only through their difference 7 = 
T —t. Similarly, C(t,7) and A(t, 7) are functions of r = T — ¢. We write B(r,7) instead of 
B(r,t, T), and we have 


B(r,r) =exp{-rC(r)- A(7)}, 720, r>0, 


where 
sinh(yT) 
C = 
o ycosh(yT) + $9 sinh(y7)’ 
1 
2a e297 
GS | a 
(7) gee - cosh(yr) + $4 wars] 
i 41/3? +207. 
We have 


B(r(0),7) = Bow [rw tut. 


Now r(u) > 0 for each u, almost surely, so B(r(0), 7’) is strictly decreasing in 7’. Moreover, 


CHAPTER 31. Cox-Ingersoll-Ross model 313 


jim Blr(0), 2) = Eexp {- [ r(u) au} = 0. 


But also, 
B(r(0), 7) = exp {-r(0)C(Z) - A(T)}, 
r(0)C'(0) + A(0) = 0, 
sim [r(0)C(L) + A(T)] = 00, 
and 


r(O)C(L) + A(T) 


is strictly inreasing in T’. 


31.5 Option on a bond 


The value at time ¢ of an option on a bond in the CIR model is 


Ty 
v(t, r(t)) = E [eso {-/ (u) au! (B(T1,T) — K)* rw) 
t 
where T; is the expiration time of the option, T> is the maturity time of the bond, and 0 <¢t < Ty < 
T2. As usual, exp {- fo r(u) du} v(t, r(t)) is a martingale, and this leads to the partial differential 
equation 
rv +u,+ (a — Br)v, + 40° Pr Upr =0, 0<t<T7, r>0. 


(where v = v(t, r).) The terminal condition is 
vo(T1,r) = (Bir, T1,T2) — K)*, r > 0. 


Other European derivative securities on the bond are priced using the same partial differential equa- 
tion with the terminal condition appropriate for the particular security. 


31.6 Deterministic time change of CIR model 


Process time scale: In this time scale, the interest rate r(t) is given by the constant coefficient CIR 
equation 


dr(t) = (a — Br(t)) dt + o,/r(t) dW(t). 
Real time scale: In this time scale, the interest rate *(t) is given by a time-dependent CIR equation 


HO) dé + oO PH aw. 


di (f) = (A(f) — BE 


t: Process time 


314 


t: Real time 


A pe- 
riod of high inter- 
est rate volatility 


Figure 31.3: Time change function. 


There is a strictly increasing time change function ¢ = y(t) which relates the two time scales (See 
Fig. 31.3). 


Let B (*, t, T) denote the price at real time f of a bond with maturity 7’ when the interest rate at time 
tis *. We want to set things up so 


Bi, t, T) = B(r,t,T) = eC T)-AULT) 


where t = (é), T = ¢(T), and C(t, T) and A(t, T) are as defined previously. 


We need to determine the relationship between f and r. We have 


With T = y(7), make the change of variable ¢ = y(f), dt = y'(é) dé in the first integral to get 


f A A 
B(r(0),0,7) = Bexp {- [ reb'@ ai} 


and this will be B(#(0), 0,7) if we set 


CHAPTER 31. Cox-Ingersoll-Ross model 315 


31.7. Calibration 


a 
exp {-a(0o 2) (etd, eT} 
g'(t) 
— exp {rele zy > A(é, ty} ’ 


where 


do not depend on é and i only through T —f, since, in the real time scale, the model coefficients 
are time dependent. 


Suppose we know #(0) and B(#(0), 0,7) for all T € [0,7]. We calibrate by writing the equation 
B(F(0), 0,7) = exp {-FO)C(0,) — A(0, 7), 


or equivalently, 


ny 


— log B(F(0), 0,7) = C(v0), o(2)) + A(v(0), 9(2)). 


Take a, 9 and o so the equilibrium distribution of r(¢) seems reasonable. These values determine 
the functions C’, A. Take y’(0) = 1 (we justify this in the next section). For each 7’, solve the 
equation for y(7’): 


— log B(F(0), 0,7) = FO)C(O, o(P)) + AO, 9(Z)). *) 


The right-hand side of this equation is increasing in the y(T ) variable, starting at 0 at time 0 and 
having limit 00 at 00, i.e., 
*(0)C'(0, 0) + A(0, 0) = 0, 
li P T)+ A(0,7P)| = ow. 
lim (OC, 7) + A(0,T)] =o 


Since 0 < — log B(F(0), 0, T) < 00, (*) has a unique solution for each 7. For T = 0, this solution 
is (0) = 0. If 7) < To, then 


— log B(r(0),0, 7) < —log B(r(0), 0, 7), 


so oT ‘sae oT. 2). Thus ¢ is a strictly increasing time-change-function with the right properties. 


316 
31.8 Tracking down ,’(0) in the time change of the CIR model 


Result for general term structure models: 


0 
-5F log B(0, 1 =r(0): 


Justification: 
T 
B(0O,T) = Eexp -| r(u) dup. 
ft) 


— log B(0,T) = lox Beso {= [rte au) 


j B [rerjer Jere 
Se ee 
OT 6 ( ) Ee7h 1 du 


0 


T=0 


In the real time scale associated with the calibration of CIR by time change, we write the bond price 
as 


B(F(0), 0,7), 
thereby indicating explicitly the initial interest rate. The above says that 
a : Z 
—— log B(F(0),0, 7 = F(0). 
sp oe BUF(O).0,7)] = F(0) 


The calibration of CIR by time change requires that we find a strictly increasing function y with 
(0) = 0 such that 


1 
¢'(0) 
where B(#(0), 0,7), determined by market data, is strictly increasing in 7, starts at 1 when 7’ = 0, 
and goes to zero as Too. Therefore, — log B(?(0), 0, 7) is as shown in Fig. 31.4. 


~ log B(*(0),0, 2) = ——#(0)C(v(P)) + Aly(D), FE 0, (cal) 


Consider the function 


FO)C(T) + A(T), 
Here C'(T’) and A(7’) are given by 
sinh(yP 
Cys —— a 
ycosh(yP) + 58 sinh(y7) 
1 
= AT 
A) es ge | | 
o? ycosh(y7’) + 58 sinh(yT) 


y= $y 6? + 207. 


CHAPTER 31. Cox-Ingersoll-Ross model 317 


— log B(F(0), 0,7) 


Goes to 00 


Strictly increasing 


Figure 31.4: Bond price in CIR model 


— log B(F(0), 0,7) 


be 


Figure 31.5: Calibration 


The function *(0)C'(Z’) + A(T) is zero at 7’ = 0, is strictly increasing in T, and goes to oo as 
T'— oo. This is because the interest rate is positive in the CIR model (see last paragraph of Section 
31.4). 


To solve (cal), let us first consider the related equation 
~ log B(F(0), 0,7) = FO)C(y(P)) + A(e()). (cal’) 
Fix T and define y(T) to be the unique T for which (see Fig. 31.5) 
—log B(F(0),0, 7) = F(O)C(L) + A(T) 


If T = 0, then y(T) = 0. If T, < 7, then y(T1) < y(T2). As T—00, y(T)—00. We have thus 
defined a time-change function ¢ which has all the right properties, except it satisfies (cal’) rather 
than (cal). 


318 
We conclude by showing that »’(0) = 1 so y also satisfies (cal). From (cal’) we compute 


*(0),0,7)] 


( 

T=0 
FO)C(G(0))e"(0) + A(e(0))e"(0) 

= F(0)C"(0)~"(0) + A'(0) 9'(0). 


We show in a moment that C’(0) = 1, A’(0) = 0, so we have 
F(0) = F(0)p"(0). 


Note that 7(0) is the initial interest rate, observed in the market, and is striclty positive. Dividing by 


*(0), we obtain 
¢'(0) =1. 


Computation of C’(0): 

1 
$e 
(y cosh(yT) + 43 sinh(77)) 


— sinh(yrT) Ce sinh(y7) + $y cosh(77))| 


OCS E cosh(yr) ( cosh(yr) + $3 sinh(77)) 


1 
C') = 5 [10 +9) - 00+ 38y)] =1. 
Computation of A’(0): 
a “2. \yeosh(yr) +6 sinh(yT) 
AOS oa jp] 
x ———— [heer (7 cosh(yT) + $3 sinh(77)) 


(7 cosh(yT) + +6 sinh(yr)) 
es vet /? (7 sinh(yrT) + $Y cosh(77))| ; 
A(0) =~] a [Eo +9 - 10+ 439) 


2 
2a 1 | By? 
--3-a|F-3 | 


Chapter 32 


A two-factor model (Duffie & Kan) 


Let us define: 


X(t) = Interest rate at time t 


X2(t) = Yield at time ¢ on a bond maturing at time t + To 


Let X1(0) > 0, X2(0) > 0 be given, and let Xj(t) and X(t) be given by the coupled stochastic 
differential equations 


dX, (t) = (a11.X1 (t) Te a42X9(t) by) dt o14/ 31X41 (t) Fie 2 X2(t) +r oa dW, (t), (SDE1) 
dX2(t) = (aa1X41 (t) nie a22X9(t) bz) dt o21/ 3X1 (t) ar 92 X2(t) +r a (p dW, (t) + 1- p? dW (t)), 


(SDE2) 


where W, and W2 are independent Brownian motions. To simplify notation, we define 


Y(t) 2 aX) (t) + G2Xo(t) +a, 
Wa(t) = pWi(t) + f1— p? Walt). 


Then W3 is a Brownian motion with 
dW,(t) dW3(t) = p dt, 
and 
dX, dX,=0°Y dt, dX_dX_y=032Y dt, dX ,dX_= poyooY dt. 


319 


320 


32.1 Non-negativity of Y 


dY = 3, dX, + By dX 
= (B1041.X1 + Gray2X2 + 161) dt + (Goa21X1 + Bod22X2 + Bob2) dt 


+ VY (8101 dW, + Bopo dW, + Bo\/1 — p2a2 dW.) 
= [(Grdi1 + B2Ge1)X1 + (Graig + B2a22)X2] dt + (8101 + Boba) dt 


a 
+ (B20? + 26; B2po10 + 3303)2,/Y (t) dW4(t) 
where 


(Broa + Bapo2)Wilt) + Bay — pioaWalt) 


\/ B2o} + 261 B2pa102 + B303 


is a Brownian motion. We shall choose the parameters so that: 


W(t) = 


Assumption 1: For some YS, Bray + Boao1 = yi, Byaya + Boa22 = Bo. 
Then 
dY = [yG1.X1 4 yG2X2 + ay] dt + (8161 + G2b2 — ay) dt 
i 
+ (Bia? + 281 Boparo2 + B303)2VY¥ dW, 
1 
= +Y dt + (B1b, + Bob2 — ay) dt + (G70? + 281 B.po.02 + B302)2VY dW. 


From our discussion of the CIR process, we recall that Y will stay strictly positive provided that: 
Assumption 2: Y (0) = 6,X1(0) + 62X2(0) +a > 0, 


and 
Assumption 3: Byb4 + Bobo —yQ > $(Bio7 + 231 G2p0102 + 0565). 


Under Assumptions 1,2, and 3, 
Y(t) >0, O<+t< o, almost surely, 
and (SDE1) and (SDE2) make sense. These can be rewritten as 


dX (t) = (a11.X4 (t) ae a42X2(t) by) dt O71 Y (t) dW (t), (SDE1’) 


dX2(t) = (aa1X1 (t) ae 99X9 (t) bz) dt o24/Y (t) dW3(t). (SDE2’) 


CHAPTER 32. A two-factor model (Duffie & Kan) 321 
32.2 Zero-coupon bond prices 


The value at time ¢ < T of a zero-coupon bond paying $1 at time 7’ is 


Ti 
B(t,T) = E eso {-{ X4(u) au} rw) é 
i 
Since the pair (X1, X2) of processes is Markov, this is random only through a dependence on 
X,(t), X2(t). Since the coefficients in (SDE1) and (SDE2) do not depend on time, the bond price 


depends on ¢ and 7’ only through their difference 7 = T' — t. Thus, there is a function B(21, 72, 7) 
of the dummy variables x1, x2 and T, so that 


B(X(t), X2(t),T - 1) = B eso {- i Xi(u) aa) rw) 
The usual tower property argument shows that 
ae {- if racn au} B(Xi(t), Xa(t), 7-2) 
is a martingale. We compute its stochastic differential and set the dt term equal to zero. 
d (exp {- [ PaCS au} B(Xi(t), Xa(t), 7 - ) 
= exp {- i X1(u) au} [Xi dt + B,, dX, + B,, dX2 — B, dt 
+ $Bo,e, dX, dX, + Boye, dX dX + $ Boye, XQ aX,| 
= exp {- a X1(u) au} |(-x8 + (441X1 + ay2X2 + 01) Be, + (€a1X1 + G22X2 + b2)B,, — B, 


+ $07Y Boye, + po102Y Beye + bo8Y Bose dt 


| oi VY B,, dW, + 02VY B,, awe 
The partial differential equation for B(x, x2, T) is 


—2,B-—B,+(a11%1+412%2+61) By, +(aa1%1+422%2+-b2) Bz, +407 (121 4+0202+a) Be, 0, 
+ po102(G121 + 222 + 0) Boye, + $05 (G121 + B2t2 + 0) By, = 0. (PDE) 


We seek a solution of the form 
B(#1, 2,7) = exp {—#1C1(T) — e2Co(7r) — A(r)}, 
valid for all 7 > 0 and all 21, x2 satisfying 


Bix, + Box +a> 0. (*) 


322 


We must have 
B(a1,%2,0)=1, V1, 22 satisfying (*), 

because 7 = 0 corresponds to ¢ = 7’. This implies the initial conditions 

C1 (0) = C2(0) = A(0) = 0. (IC) 
We want to find C';(7), Co(7), A(7) for 7 > 0. We have 

B,(21, 2,7) = [—#1C}(r) — 22C}(r) — A’(r)] B(zi, 22,7), 

= —C1(7) BC 

= C7e) Bl 


Gi, fat); 


Das 


(PDE) becomes 
0 = B(a1, £2,T) [a + 21,C}(T) + 22C3(r) + Al(r) — (4121 + 1222 + 61)C1 (7) 


— (21% + d22%2 + b2)C (rT) 
Is oi (B124 Box a)C?(r) + po109(G121 + B22 + a)C1(T)Co(T) 


1 
2 
+ t03(B,21 Bo% ajC3(n)| 


= Bey .05, 7) | —14C (7) -— a1Ci(r) — a21C2(r) 
+ 307 81C7 (7) + pord281C1(7)C2(r) + $0381C3(7)| 
+ £2B(a, 2,T) csr) = ay2C'1(T) — a22C2(T) 
+ $07 B2CT(T) + pore22Cy(r)C2(T) + $03 C3(7)| 
piace ec [4') — by C(t) — b2Co(r) 


+ sojaC} (rT) + poyo2aC1(7T)Co(r) + bodac3(r)| 
We get three equations: 


Ci(r) = 14+ ayy (7) + aq1Ca(r) — S07 81C7 (7) — p00 28101 (r)C2(r) — $0381C3 (7), 


(1) 
C(0) = 0; 
Cy(7) = ay2C1 (7) + a22C2(7) — 37 B2C7 (7) — por9282C1(7)C2(7) — 50302CZ(7), 2) 
C2(0) = 0; 
Al(r) = b1C1 (rT) + b2C2(7) — tajaC{ (rT) — poo2aC(T)Co2(T) - t05aC3 (rT), (3) 
A(0) = 0; 


CHAPTER 32. A two-factor model (Duffie & Kan) 323 


We first solve (1) and (2) simultaneously numerically, and then integrate (3) to obtain the function 
A(r). 


32.3 Calibration 


Let 7) > 0 be given. The value at time ¢ of a bond maturing at time ¢ + 79 is 
B(Xy (t), X(t), To) = exp{—X, (L)CY (To) — Xo (t)C'2(T0) oa A(to) } 


and the yield is 


s log B(Xy(t), Xo(t), 7) = [X14 (t)C1 (to) + Xo(t)Co(70) + A(70)] 


But we have set up the model so that X(t) is the yield at time ¢ of a bond maturing at time ¢ + 7o. 
Thus j 
Xo(t) = — [Xi (t)C1 (70) + X2(t)C2(70) + A(ro)]. 


TO 
This equation must hold for every value of X(t) and X2(t), which implies that 
Ci(7|) =0, C2(t) = 7, A(t) = 0. 
We must choose the parameters 
411,412, 61; G21, 422,62; 1, 82,0; 1, p, 02; 


so that these three equations are satisfied. 


324 


Chapter 33 


Change of numéraire 


Consider a Brownian motion driven market model with time horizon 7*. For now, we will have 
one asset, which we call a “stock” even though in applications it will usually be an interest rate 
dependent claim. The price of the stock is modeled by 


dS(t) = r(t) S(t) dt + o(t)S(t) dwt), (0.1) 


where the interest rate process r(t) and the volatility process a(t) are adapted to some filtration 
{F(t); 0 < t < T*}. W is a Brownian motion relative to this filtration, but {F(t); 0<t< 7T*} 
may be larger than the filtration generated by W. 


This is not a geometric Brownian motion model. We are particularly interested in the case that the 
interest rate is stochastic, given by a term structure model we have not yet specified. 


We shall work only under the risk-neutral measure, which is reflected by the fact that the mean rate 
of return for the stock is r(t). 


We define the accumulation factor 


a(t) =exp{ f rw aut, 


so that the discounted stock price 4 is a martingale. Indeed, 


S(t)\ _ S(t) 
d (Ga) = Fay awn. 


The zero-coupon bond prices are given by 


BT)=E eso {- io au} rw) 


326 


SO 


Bit, T) | 1 | 
——_—_ = FE |—|F tt 
a” lal 
is also a martingale (tower property). 


The 7'-forward price F(t, T) of the stock is the price set at time t for delivery of one share of stock 
at time 7’ with payment at time 7’. The value of the forward contract at time ¢ is zero, so 


0-E Ea (S(T) - F(t,T)) Fol 
_ SL) | 4] A) 
= BOE | a Pe F(t, T)E Fa Fo) 
— gy 2) _ 
= BO Fy - PUTED) 
= S(t) — F(t, T)B(t,T) 
Therefore, 
POD = Fa 


Definition 33.1 (Numéraire) Any asset in the model whose price is always strictly positive can be 
taken as the numéraire. We then denominate all other assets in units of this numéraire. 


Example 33.1 (Money market as numéraire) The money market could be the numéraire. At time t, the 


stock is worth a units of money market and the 7'-maturity bond is worth oe units of money market. 


s 
8 
a 


Example 33.2 (Bond as numéraire) The 7'-maturity bond could be the numéraire. At timet < 7’, the stock 
is worth F'(¢, T') units of 7-maturity bond and the T-maturity bond is worth 1 unit. a 


We will say that a probability measure IPx, is risk-neutral for the numéraire N if every asset price, 
divided by NV, is a martingale under Pj. The original probability measure JP is risk-neutral for the 
numéraire 6 (Example 33.1). 


Theorem 0.71 Let N be a numeéraire, i.e., the price process for some asset whose price is always 
strictly positive. Then IPx defined by 


Py(A) = Te FH dP, VA€ F(T"), 


is risk-neutral for N. 


CHAPTER 33. Change of numéraire 327 


Note: IP and IPy are equivalent, i.e., have the same probability zero sets, and 


P(A) = N(0) i p 7 dPy, VAC F(T"). 


Proof: Because N is the price process for some asset, N/3 is a martingale under JP. Therefore, 


1 N(T* 
Pr) = = fo na 
scl MEY 
N(0) B(T*) 
1 N(0) 
= NW BO) 
=1, 


and we see that IP is a probability measure. 


Let Y be an asset price. Under IP, Y/(3 is a martingale. We must show that under IPy, Y/N is 
a martingale. For this, we need to recall how to combine conditional expectations with change of 
measure (Lemma 1.54). If0 <t < 7 < T* and X is F(7')-measurable, then 


_ N06 [NO 
Ey xr = 5 Froneaiael 

sf) | BA) 

= Fae la PO): 
Therefore, 

Y@ _§O pp [AOYO 
ev em] = wo" Lam wel 
OPA) 
N(t) BO) 

_¥@ 

= NO 
which is the martingale property for Y/N under /Py. | 


33.1 Bond price as numéraire 


Fix T' € (0, 7*] and let B(t, 7’) be the numéraire. The risk-neutral measure for this numéraire is 


- 4 B(T,T) 
P(A = aon | So dP 


1 1 
= 507 |, TH VA FD). 


328 


Because this bond is not defined after time 7’, we change the measure only “up to time 7”’, i.e., 
using AEN oe and only for A € F(T). 


IPr is called the T-forward measure. Denominated in units of 7'-maturity bond, the value of the 
stock is 


FD) = gy eae ace be 


This is a martingale under JP, and so has a differential of the form 
dF (t,T) =or(t, T)F(t,T) dWr(t), O<t<T, (1.1) 


i.e., a differential without a dt term. The process {W7; 0 < t < 7} is a Brownian motion under 
IP. We may assume without loss of generality that o7(¢,7) > 0. 


We write /’(t) rather than /’(t, 7’) from now on. 


33.2 Stock price as numéraire 


Let S(t) be the numéraire. In terms of this numéraire, the stock price is identically 1. The risk- 
neutral measure under this numéraire is 


Ps(A) = sm las dP, YA F(T"). 


Denominated in shares of stock, the value of the 7’-maturity bond is 


S(t) FO) 


This is a martingale under J/Ps, and so has a differential of the form 


d (=a) = ¥(t,T) (a) dWs(t), (2.1) 


where {W s(t); 0 < ¢ < 7*} is a Brownian motion under /Ps. We may assume without loss of 
generality that y(t, 7’) > 0. 


B(t,T) 1 
t 


Theorem 2.72 The volatility y(t,T’) in (2.1) is equal to the volatility o7(t,T) in (1.1). In other 
words, (2.1) can be rewritten as 


d (=a) <p?) (=) dWs(t), (2.1”) 


CHAPTER 33. Change of numéraire 329 


Proof: Let g(z) = 1/z,sog'(x) = —1/2?, g(x) = 2/z°. Then 


= 9'(F(Q) dF) + 99" (FO) dF dF 
= ~—_op(t,T) F(t, 7) dWe(t) + <0 (t, T) F(t, 7) at 


FP) Pa 
2 ro [-or(t,T) dWr(t) + of(t,7) ai] 


Sei) (=) [-dWr(t) + or(t,T) dd]. 


Under JP7, —W, is a Brownian motion. Under this measure, FO has volatility o(t, 7’) and mean 


rate of return o7(t, 7’). The change of measure from Pr to IPs makes mm a martingale, i.e., it 
changes the mean return to zero, but the change of measure does not affect the volatility. Therefore, 
y(t, 7) in (2.1) must be a(t, 7) and Ws must be 


Ws(t) = —Wr(t) + [ op(u,T) du. 


33.3 Merton option pricing formula 


The price at time zero of a European call is 


V(0)=E ane _ Ky | 
s ; 
= Ih Frtism>m| = hE sas tscnom| 
= S(T) ae 1 
= $0) f serony HOST) FP KFOD | semen HOT IAT © 
= $(0)Ps{S(T) > K} — KB(0,T)Pr{ S(T) > K} 
= $(0)Ps{ F(T) > K} — KB(0,T)Pr{ F(T) > K} 
= $(0) Ps (ra < x} _ KB(0,T)Pr{F(P) > K}. 


330 


This is a completely general formula which permits computation as soon as we specify o7(t, 7’). If 
we assume that o7(t, 7) is a constant 77, we have the following: 


rig MA oot} 
Ps Cara < x) = Ps {orWs(2) — 2071 < log eT 


dog WD) 1 S(O) 4 
<P < aR RBOT HOYT] 


where 


Similarly, 


F(T) = 


Pr{ F(T) > K} = Pr {orWr (1) — $07T > log SS are 
) 


(> san be tg 
ee 


= {HH <a ont] 


where 


If r is constant, then B(0, 7) = e7"?, 


pli= ay 
ee! 5(0) 12 | 


and we have the usual Black-Scholes formula. When r is not constant, we still have the explicit 
formula 


V (0) = S(O)N(p1) — KB(0,T)N(p). 


CHAPTER 33. Change of numéraire 331 


As this formula suggests, if a is constant, then for 0 < ¢ < 7’, the value of a European call expiring 
at time T is 
V(t) = SON (pit) — KBE, T)N (pa(t)), 
where 
1 F(t) 
{) =—_— _|log 4 Ler = 
pi(t) a [lee K + 30 F( ) ’ 
1 F(t) 
{= —_ lop — Fo? r-o|. 
p2( ) Gf h=1 log K 30 F( ) 
This formula also suggests a hedge: at each time t, hold N(pj(t)) shares of stock and short 
K N(p2(t)) bonds. 


We want to verify that this hedge is self-financing. Suppose we begin with $ V (0) and at each time 
t hold N(p;(t)) shares of stock. We short bonds as necessary to finance this. Will the position in 
the bond always be — KN (p2(t))? If so, the value of the portfolio will always be 


S()N(pi(t)) — KB(E,T)N (po) = VO), 
and we will have a hedge. 


Mathematically, this question takes the following form. Let 


A(t) = N(pi(¢)). 
At time ¢, hold A(t) shares of stock. If X (¢) is the value of the portfolio at time t, then X (¢) — 


A(t)S(t) will be invested in the bond, so the number of bonds owned is Aes (t) and the 
portfolio value evolves according to 


X(t) - AM 


dX (t) = A(t) dS(t) + S(t) dB(t,T). (3.1) 


The value of the option evolves according to 
dV (t) = N(pri(t)) S(t) + S(t) dN (pr) + dS(é) dN (er 2)) 
— KN(p2(t)) dB(t, T) — K dB(t,T) dN(po(t)) — KBE, T) dN (po(t)). (.2) 
If X (0) = V(0), will X(t) = V(t) forO <t <7? 
Formulas (3.1) and (3.2) are difficult to compare, so we simplify them by a change of numéraire. 


This change is justified by the following theorem. 


Theorem 3.73 Changes of numéraire affect portfolio values in the way you would expect. 


Proof: Suppose we have a model with / assets with prices S;,52,...,5;. At each time ¢, hold 
A; (t) shares of asset 27, 7 = 1,2,...,4 — 1, and invest the remaining wealth in asset k. Begin with 
a nonrandom initial wealth X (0), and let X(t) be the value of the portfolio at time ¢. The number 
of shares of asset & held at time ¢ is 


332 


and X evolves according to the equation 


k-1 k-1 dS, 
X= yy Apis; + (x- yas] a 
k 


i=l 


k 
= yA Sj. 
t=1 
Note that 
k 
t) = SUA) Si(0) 
2=1 
and we only get to specify A;,... , Az—1, not A,, in advance. 


Let NV be a numéraire, and define 


Then 
se ie) aad 


EE is+ (Soassi) a (= )+ A; as,d (= ) 


i=l 


= (= dS; + Sid (=) feed (x) 


Now 


(x -Ye! A;S;) 
Sk 
(X/N - DET A:S:/N) 
S,/N 
_ x = Sa AiS; 
i 


Therefore, 


Sai dS; s+ (8 -Yas) S a, 


tL 


CHAPTER 33. Change of numéraire 333 


This is the formula for the evolution of a portfolio which holds A; shares of asset 7, i = 1,2,...,k—- 
1, and all assets and the portfolio are denominated in units of NV. | 


We return to the European call hedging problem (comparison of (3.1) and (3.2)), but we now use 
the zero-coupon bond as numéraire. We still hold A(t) = N(p1(t)) shares of stock at each time f. 
In terms of the new numéraire, the asset values are 


Stock: lu, = P(t), 


Bond: =i, 
sf aD) 
The portfolio value evolves according to 
A ~ d(1) , 
dX (t) = A(t) dF(t) + (X(t) - A())—— = A(t) dF(t). 3.1’) 


In the new numéraire, the option value formula 
V(t) = N(pi@))S() — KBE, T)N (p24) 


becomes 


and 


dV = N(pi(t)) dF(t) + F(t) dN (p(t) + AN (pi(0) dF (t) — K AN (pal), 
(3.2’) 


To show that the hedge works, we must show that 
F(t) AN (px(t)) + dN (pi(t)) dF(t) — K dN (pa(t)) = 0. 


This is a homework problem. 


334 


Chapter 34 


Brace-Gatarek-Musiela model 


34.1 Review of HJM under risk-neutral /P 


f(t,T) = Forward rate at time ¢ for borrowing at time 7’. 
df(t,T) = o(t,T)o*(t,T) dt+o(t,T) dWit), 
where 
if 
eae el Se ee / a(t, u) du 
t 


The interest rate is r(t) = f (t,t). The bond prices 


BUST = IE eso {— [rea au) rw) 
= exp {= [fiw au) 


dB(t,T) = r(t) B(t,T) dt - o*(t, T) B(t,T) dW(t). 


——— 
volatility of T'-maturity bond. 


satisfy 


To implement HJM, you specify a function 
Gt), UST. 
A simple choice we would like to use is 
Ot FV of (teh) 


where o > 0 is the constant “volatility of the forward rate”. This is not possible because it leads to 


Ot, f= of rin du, 


t 


df(t,T) = 0 f(t,T) (fs in) dt + of(t,T) dW(t), 


335 


336 


and Heath, Jarrow and Morton show that solutions to this equation explode before 7’. 


The problem with the above equation is that the dt term grows like the square of the forward rate. 
To see what problem this causes, consider the similar deterministic ordinary differential equation 


OQ = Fl), 
where f(0) = c > 0. We have 


me 


1 1 m 
Fat FH =f b=! 
a 1 ct — 1 


eee ny 
jo=—, 


— 
fF (0) 


This solution explodes at t = 1/c. 


34.2 Brace-Gatarek-Musiela model 


New variables: 


Current time t 


Time to maturity 7 = T' —¢. 
Forward rates: 
r(t,r)=ft,t+r7), r(t,0)=f(@,t) =r), 
0 0 
az ts T) = api tts t+ T) 


Bond prices: 


D(t,7) = B(t,t+7) 


= exp {-f~ f(t,v) av 


(u=v—t; du=dv): =exp{- f" st+0) au} 


= exp {- i r(t,u) au} 


OT 


9, Plt7) = ae eae Pl Sapte Oise): 


(2.1) 


(2.2) 


(2.3) 


(2.4) 


CHAPTER 34. Brace-Gatarek-Musiela model 337 


We will now write o(t, 7) = o(t, T — t) rather than a(t, 7). In this notation, the HJM model is 


df(t,T) = o(t,r)o"(t,7) dt +o(t,7) W(), (2.5) 
dB(t,T) = r(t)B(t,T) dt — o*(t, 7) B(t, T) dW), (2.6) 
where 
o*(t,7) = ‘ ene (2.7) 
Lortt A= eae: (2.8) 


We now derive the differentials of r(¢,7) and D(t, 7), analogous to (2.5) and (2.6) We have 


Cri e).= df (t,ft+7) ao 2 (t,t+7) dt 


differential applies only to first argument 


PEO) Gt r)o*(t, 7) dt + o(t,r) dW(t) + ott 7) dt 
T 


= < |r(t,r) + d(o%(t,7))?] dt + o(t,r) a(t). (2.9) 
Also, 


dD(t,7) = dB(t,t+7) + Bt +7) dt 


differential applies only to first argument 
(2.6),(2.4) * 
=" r(t) B(t,t +7) dt — o* (t, Tr) B(t,t +7) dW(t) — r(t, 7) D&E, 7) dt 
=‘ [r(t,0) — r(t,7)] D(t, 7) dt — o* (t, 7) D(t, rT) dW). (2.10) 


34.33. LIBOR 


Fix 6 > 0 (say, 5 = 4 year). $ D(t, 5) invested at time ¢ in a (¢ + 5)-maturity bond grows to $ 1 at 
time t + 6. L(t, 0) is defined to be the corresponding rate of simple interest: 


D(t, 6)(1+ 5L(t, 0) =1, 


1 a 
1+ 6L(t,0) = Dit.s) -exp{ é r(t, u) aul. 


338 


34.4 Forward LIBOR 


5 > 0 is still fixed. At time ¢, agree to invest $ Pt Crt) at time tf + 7, with payback of $1 at time 
t+7+6. Can do this at time t by shorting Plats 24) bonds maturing at time ¢ + 7 and going long 
one bond maturing at time t + 7 + 6. The value of this portfolio at time ¢ is 
D(t,r +8) 
—-————_ Dit Dt 6) =0. 
Dt, 7) (t,7) + (t,7 + ) 
The forward LIBOR L(t, T) is defined to be the simple (forward) interest rate for this investment: 
D(t,7 4+ 4) 
——__§_ (1+6L(t = 
D{t, T) ( a ( d T)) ’ 
D(t — fo r(t,u) d 
1+6L(t,r) = (t, 7) tien exp { So r(t, w) us 


D(t,7 + 8) exp {— fo? r(t,u) du} 


= exp ve r(t,u) a ; 


exp 1 fe ne, u) du} -1 


L(t,r)= 5 (4.1) 
Connection with forward rates: 
0 T+5 THé 
— exp i, r(t,u) du = r(t,7 + 4) exp / r(t,u) du 
6 T 5=0 T $=0 
=e), 
so 
T+éd 
exp, {77° r(t,u) dup —1 
Face t+ T) = r(t, T) = lim exp {fet r(t, w) du} = 1 
5f0 é 
T+d 
exp, f?t? r(t,u) dup —1 
Hie aS : } , 6>0 fixed. 
(4.2) 


r(t,7) is the continuously compounded rate. L(t, 7) is the simple rate over a period of duration 6. 


We cannot have a log-normal model for r(t, 7) because solutions explode as we saw in Section 34.1. 
For fixed positive 5, we can have a log-normal model for L(t, 7). 


34.5 The dynamics of L(t, 7) 


We want to choose o(t,7), ¢ > 0, 7 > 0, appearing in (2.5) so that 
dL(t,T) = (...) dt+ L(t,r) y(t,7) dW(t) 


CHAPTER 34. Brace-Gatarek-Musiela model 339 
for some y(t,7), t > 0,7 > 0. This is the BGM model, and is a subclass of HJM models, 


corresponding to particular choices of o(t, 7). 


Recall (2.9): 


(o*(t,u))?] dt +o(t, u) dW (t). 


a 
3 
a oe 
S 
~ 
—* 
II 
= 
Ph 
S 
= 
—_— 
+ 
bole 


= [SE fem t Hort uy'] duatt [ o(tu) dua 


= |r(t,7 +6) — r(t,r) + E(o%(t, 7 + 8))? - H(o*(t,7))"] dt 
+ [o*(t,7 +4) — o*(t,7)| dW(t) 


and 


T+6 T+5 ‘ 
+ Rod | : Hi @) au! («/ : r(t,u) in) 


(4.1).(5.1) rll + 6L(t,7)] x 
x {Ire 7+6)—r(t,6)+ $(o*(t, T+ 6))? = t(o%(t, ral dt 
+ [o*(t,7 +5) — o*(t,7)] dW(t) 


4 dor(tr +6) — oP at} 


= —[1+ 6L f(t, mf rte 7+6)—r(t,5)] dt 
+o*(t,7 + 6)[o*(t, 7 + 6) — o*(t,7)] dt 


= +[o*(t,7 +4) — o*(t,7)] awin}. 


al 


(5.1) 


(5.2) 


But 
a a exp {{7*? r(t, u) dub —1 
9, lb) = a; js 
rt+é 
ase if it au) HG PIA SHES) 
= FIL + 8b, r)llr(t,7 +4) —r(t,8)) 
Therefore, 


tii) = ou, T) dt + sl + 6L(t, 7) ]lo*(t,7 +4) — o*(t,7)]-[o*(t, 7 + 6) dt + dW (#)]. 
ig 
Take y(t, 7) to be given by 


y(t, 7) L(t, 7) = =[1+ 6L(¢,7)]lo*(t,7 + 5) — o*(t,7)]. (5.3) 


Hl 


Then 


dif, s) = [Le T) + y(t, 7) L(t, r)o* (t,7 + 5)] dt + y(t, 7) L(t, 7) dwt). a 
A) 


Note that (5.3) is equivalent to 


OL (t, T)y(t, 7) 


o*(t,7 +6) =o*(t,7) + 1+ 6L(t,7) , 


(5.3’) 


Plugging this into (5.4) yields 
bL7(t, 7)77(t, 7) 
1+ 6L(t,7) 
+ y(t, r)L(t,7) dW(t). 6.4’) 


Cha) = oo, T) + y(t, rT) L(t, r)o*(t, 7) + 


34.6 Implementation of BGM 
Obtain the initial forward LIBOR curve 
L(0,7), 7>0, 
from market data. Choose a forward LIBOR volatility function (usually nonrandom) 


a(t7), t20,7F > 0. 


CHAPTER 34. Brace-Gatarek-Musiela model 341 


Because LIBOR gives no rate information on time periods smaller than 5, we must also choose a 
partial bond volatility function 


oUF). to 0, Oke <6 


for maturities less than 6 from the current time variable ft. 


With these functions, we can for each 7 € [0, 4) solve (5.4’) to obtain 
L(i,r), £20; 057 <0. 

Plugging the solution into (5.3’), we obtain o*(t, 7) for 6 < 7 < 26. We then solve (5.4’) to obtain 
LU). £20, 8 Se ee, 

and we continue recursively. 


Remark 34.1 BGM is a special case of HJM with HJM’s o*(¢, 7) generated recursively by (5.3’). 
In BGM, 7(¢t, 7) is usually taken to be nonrandom; the resulting o*(t, 7) is random. 


Remark 34.2 (5.4) (equivalently, (5.4’)) is a stochastic partial differential equation because of the 
ZLit, T) term. This is not as terrible as it first appears. Returning to the HJM variables ¢ and 7’, 
set 
K(t,T) = Lt, T -¢). 
Then 
dk (t,T) = dL(t,T —t) - fit T —t) dt 


and (5.4) and (5.4’) become 


dK (t,T) = y(t, —t)K(t,T) [o*(t, 7 —t + 6) dt + dW(0)] 
5K (t,T) y(t, Tt) 


=y7t,7T -OKCt,T *(t,T — t) dt 


dt + dW(t)| . 
(6.1) 


Remark 34.3 From (5.3) we have 


ie Sarg ee 
If we let 60, then 


mre s rtets| =rhs, 
06 5=0 


and so 
y(t,T —t)K(t,T)3o0(t, T -#). 


We saw before (eq. 4.2) that as 60, 


L(t, r)r(t,7) = f(t,t+7), 


342 


so 
K(t,T)3f(t,7). 


Therefore, the limit as 6.0 of (6.1) is given by equation (2.5): 
df(t, 1) = o(t, T —t)[o* (¢,T — t) dt + dW(t)]. 


Remark 34.4 Although the di term in (6.1) has the term Se nee 


to this equation do not explode because 


involving K?, solutions 


oy7(t, T —t)K7(t,T) E oy? (t, T — t) K7(t, T) 
Peery = 5K (t, 7) 
et. T DEG, T). 


34.7 Bond prices 


Let 3(t) = exp ne r(u) du} . From (2.6) we have 


BGT: hice 
d (A> ) = FOL (t)B(t,T) dt + dB(t, T)] 
3 BE) apes. 

= ae (t,T — t) dW(t). 


a to this stochastic differential equation is given by 


The solution 3 


————~ = exp {- i, o*(u, T — u) dW(u) — E [owt —4u))? au} : 


This is a martingale, and we can use it to switch to the forward measure 


1 1 
P(4)= aon an 
B(T, T) 


=e WT) BOT) dIP VWAE F(T). 


Girsanov’s Theorem implies that 


is a Brownian motion under /Pr. 


CHAPTER 34. Brace-Gatarek-Musiela model 343 
34.8 Forward LIBOR under more forward measure 


From (6.1) we have 


dK (t,T) = y(t, — t)K(t,T) [o*(t, 7 —t +5) dt + dW(t)] 
= 7(t, P-t)K(t,T) dWrys(t), 


sO 
t t 
K(t,T) = K(0,T) exp{ [ y(u,T — u) dWris(u) — if y7(u,T — u) au} 
0 0 
and 


KT) Hh) exp{ fp atet —u) dWr45(u) — Af y7(u,T — u) aa} 


(8.1) 
T T 
= K(t,T) exp / y(u,T — u) dWris(u) - if y?(u,T — u) du 
i i 
We assume that y is nonrandom. Then 
T £E 
At)-= i y(u,T — u) dWris(u) - if y?(u,T — u) du (8.2) 
i i 


is normal with variance ie 
p= fo P,P =u) du 
t 


and mean —4p°(t). 


34.9 Pricing an interest rate caplet 


Consider a floating rate interest payment settled in arrears. At time 7’ + 4, the floating rate interest 
payment due is 6L(7,0) = 6K(T,T), the LIBOR at time 7. A caplet protects its owner by 
requiring him to pay only the cap dc if 6K (7, 7) > dc. Thus, the value of the caplet at time T’ + 6 
is d(C (7,7) — c)+. We determine its value at times 0 <¢t < 7+. 


CaseI:T <t< 7+ 6. 


Crys(t) = E Tr pe eo (| @.1) 
~ §(K(T,T)- ot E Pa Ton Fe | 


= 5(K(T,T) — )* Bt, T +6). 


344 


Case: 0 <i <7. 
Recall that 


Peai= [20 +5) dP, VACF(T +S), 


where 
B(t,T +) 
“9 = FO BOT +5) 
We have 
Crya(t) = E HK (e, Pies F(0| 
_ G(t)B(0, T +4) B(T +6,T +5) : 
= 6B(t,T +3) BETS) WF+OBOTLH (T,T) —¢)t|F(t) 
1 Z(T+6) 


zy 


= BYL,T + drys |(K(T,T) —c)t |F(t) 


From (8.1) and (8.2) we have 
K(T,T) = K(t,T) exp{X (0), 


where X(t) is normal under Pr+5 with variance p?(t) = f,' y?(u,T — u) du and mean —4p?(t). 
Furthermore, X (¢) is independent of F(t). 


Cryst) = 6BET +8) Ergs [(K (7) exp{X(} - 0*| FO). 
Set 
a(y) = Erss [(yexp£X ()} - 6)*] 
97 (alee uy 40(t)) _cN (ales? 7 s(t) 
Then 


Crys(t) = 6 Bt, T +8) g(K(t,T)), 0<t<T-8. (0.2) 


In the case of constant -y, we have 
p(t) =yvT -t, 
and (9.2) is called the Black caplet formula. 


CHAPTER 34. Brace-Gatarek-Musiela model 345 
34.10 Pricing an interest rate cap 


Let 


To 0, T, é, T 26, seg Es = no. 


A cap is a series of payments 
O(K (1,7) —c)* attime Tig, = 0,1,...,n— 1. 
The value at time ¢ of the cap is the value of all remaining caplets, i.e., 


C= DF Cr. 


kt<T; 


34.11 Calibration of BGM 


The interest rate caplet c on L(0, 7) at time T + 6 has time-zero value 
C'r+45(0) = dB(0, tee 6) g(K(0, Dp); 


where g (defined in the last section) depends on 


T 
i y7(u,T — u) du. 
0 


Let us suppose ¥ is a deterministic function of its second argument, i.e., 


y(t,7) = (7). 
Then g depends on 


[re- C= [re dv. 


If we know the caplet price C’r45(0), we can “back out” the squared volatility i. y*(v) dv. If we 
know caplet prices 


CMH+45 (0), Cras (0), sen Cr,,+5(0), 


where Tg < 7, <...< T,, we can “back out” 


i ¥?(v) dv, a y7(v) dv = ie 72(v) dv — [ eae, 


To 0 


In this case, we may assume that ¥ is constant on each of the intervals 


(0, To), (To, 11), ereuas) (Baie Tals 


346 


and choose these constants to make the above integrals have the values implied by the caplet prices. 


If we know caplet prices C’r45(0) for all 7’ > 0, we can “back out” His ¥?(v) dv and then differen- 
tiate to discover y?(r) and 7(r) = \/72(r) for all r > 0. 


To implement BGM, we need both y(7), 7 > 0, and 
oe), tery US Fb 


Now o*(t,7) is the volatility at time ¢ of a zero coupon bond maturing at time t + 7 (see (2.6)). 
Since 6 is small (say + year), and 0 < 7 < 6, it is reasonable to set 


o' (7) S00; te 0h. OX oo: 
We can now solve (or simulate) to get 
L(t,r), ¢20,7 20, 


or equivalently, 
KG;T), t20;TS 0, 


using the recursive procedure outlined at the start of Section 34.6. 


34.12 Long rates 


The long rate is determined by long maturity bond prices. Let n be a large fixed positive integer, so 
that nd is 20 or 30 years. Then 


where the last equality follows from (4.1). The long rate is 


1 


1 ee 


34.13 Pricing a swap 


Let To > 0 be given, and set 


T, = To +4, Ty = To + 26, sees T,, = To + no. 


CHAPTER 34. Brace-Gatarek-Musiela model 347 


The swap is the series of payments 
6(L(Ty,0)—c) attime T,41,4 =0,1,...,n—1. 
For 0 < ¢ < Jo, the value of the swap is 


= pl 
Le Lay ET) —c) F(0| ; 
Now 
(Sonn a0) = Wiha) 
1 1 
L (Tk, 0) = 5 a as i 
We compute 
B(t) 
BL Fert) - FCO] 
= Oa eee? 
re Fan GRE 1-6 ) Fo) 
_pg|__ 89 | 8) 7 : 
ae B(Te) B(Le, Te+1) Su Ean F(t) ag (1+ de) B(t, Tr41) 
B(T,Tr41) 
_ pp {80 eae: 
= [aie PO] - 0 +5086 Ti) 


= B(t, Tx) — (1+ 6c) B(t, Tea). 


The value of the swap at time t is 


n-1 a(t) 
x Lair 8(L (Tk, 0) — ¢) 


nr 


F(e)| 


| 
a 


[B(t, Tk) — + de) B(t, Tr41)] 


é Dyas BU, 1) 4+ BET) = 460 B Dy tas-4 Be Pel OBE T,) 
(6:1) = 8eBUET)) = 6c BU) =o. SEB T= BU T,). 


The forward swap rate w7,(¢) at time t for maturity 7’) is the value of c which makes the time-t 
value of the swap equal to zero: 


k 


B 
B 


7 B(t, To) — Bit, Tn) 
PHY Satan BG 


In contrast to the cap formula, which depends on the term structure model and requires estimation 
of y, the swap formula is generic. 


